1 Business Problem

# Swire is looking to forecast commodity costs to help with business strategy. They desire to strike a balance between higher sales with a lower product price, and pricing their product high enough for greater margins. Forecasting costs will help them plan for pricing in the future. Understanding commodity costs will also help Swire optimize its procurement strategy and inventory management. 
# The analysis will be successful if it accurately predicts price. High scores on the model accuracy for testing data will allow for confidence in the model. The project will be executed by our team. We are not only looking for an accurate model but a interactive dashboard that is easy for Swire to use. This would allow them to select individual elements of the prediction for customized observation. Any data found to help predictions could be added in the future. 
#An important part of Swire's business is understanding commodity pricing. It impacts production, margins, and business planning. Creating an accurate time-series forecast of price will help with strategy. Future planning is key for any business and the forecast will help with planning with their resources.

2 Analytical Objective

# The desired outcome would be a strong forecast model for commodity price. Some of the key metrics will be p-value, coefficients, AIC values and more. These metrics can be shared with stakeholders for them to understand the strength of the model as they use it. Multiple models will be created to view differences in prediction and metrics. After a analysis of the performance of each model, a deferment to the best model will be made. The ultimate object would be to build a prediction model that Swire can use to inform decisions for future planning. 

3 Packages

library(quantmod)
## Loading required package: xts
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## Loading required package: TTR
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
library(ggplot2)
library(forecast)
library(astsa)
## 
## Attaching package: 'astsa'
## The following object is masked from 'package:forecast':
## 
##     gas
library(tseries)
library(tidyverse)
## ── Attaching packages
## ───────────────────────────────────────
## tidyverse 1.3.2 ──
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.2.1      ✔ stringr 1.4.1 
## ✔ readr   2.1.3      ✔ forcats 0.5.2 
## ✔ purrr   0.3.5      
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::first()  masks xts::first()
## ✖ dplyr::lag()    masks stats::lag()
## ✖ dplyr::last()   masks xts::last()
library(caret)
## Loading required package: lattice
## 
## Attaching package: 'caret'
## 
## The following object is masked from 'package:purrr':
## 
##     lift
library(bigrquery)
library(prophet)
## Loading required package: Rcpp
## Loading required package: rlang
## 
## Attaching package: 'rlang'
## 
## The following objects are masked from 'package:purrr':
## 
##     %@%, as_function, flatten, flatten_chr, flatten_dbl, flatten_int,
##     flatten_lgl, flatten_raw, invoke, splice
library(lubridate)
## 
## Attaching package: 'lubridate'
## 
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
library(openxlsx)
library(dplyr)
library(readr)
library(feasts)
## Loading required package: fabletools
## 
## Attaching package: 'fabletools'
## 
## The following objects are masked from 'package:caret':
## 
##     MAE, RMSE
## 
## The following objects are masked from 'package:forecast':
## 
##     accuracy, forecast
library(tsibble)
## 
## Attaching package: 'tsibble'
## 
## The following object is masked from 'package:lubridate':
## 
##     interval
## 
## The following object is masked from 'package:zoo':
## 
##     index
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, union
library(Metrics)
## 
## Attaching package: 'Metrics'
## 
## The following object is masked from 'package:fabletools':
## 
##     accuracy
## 
## The following object is masked from 'package:rlang':
## 
##     ll
## 
## The following objects are masked from 'package:caret':
## 
##     precision, recall
## 
## The following object is masked from 'package:forecast':
## 
##     accuracy
library(rugarch)
## Loading required package: parallel
## 
## Attaching package: 'rugarch'
## 
## The following object is masked from 'package:fabletools':
## 
##     report
## 
## The following object is masked from 'package:purrr':
## 
##     reduce
## 
## The following object is masked from 'package:stats':
## 
##     sigma
library(rmgarch)
## Warning: package 'rmgarch' was built under R version 4.2.3
## 
## Attaching package: 'rmgarch'
## 
## The following objects are masked from 'package:dplyr':
## 
##     first, last
## 
## The following objects are masked from 'package:xts':
## 
##     first, last

4 Data Load

coffee <- read.csv("Coffee_prices.csv", stringsAsFactors = FALSE)%>%
mutate(date = ymd(date))%>% 
mutate(month = format_ISO8601(date, precision = "ym"))

Aluminum <- read.csv("Aluminium_end_price.csv",stringsAsFactors = FALSE)%>%
mutate(date = ymd(date))%>% 
mutate(month = format_ISO8601(date, precision = "ym"))

sugar <- read.csv("sugar-prices-historical-chart-data.csv",stringsAsFactors = FALSE)%>%
mutate(date = mdy(date))%>% 
mutate(month = format_ISO8601(date, precision = "ym"))

soybean <- read.csv("soybean-prices-historical-chart-data.csv",stringsAsFactors = FALSE)%>%
mutate(date = ymd(date))%>% 
mutate(month = format_ISO8601(date, precision = "ym"))

soybeanoil <- read.csv("soybean-oil-prices-historical-chart-data.csv",stringsAsFactors = FALSE)%>%
mutate(date = ymd(date))%>% 
mutate(month = format_ISO8601(date, precision = "ym"))

corn <- read.csv("corn-prices-historical-chart-data.csv")%>%
mutate(date = ymd(date))%>% 
mutate(month = format_ISO8601(date, precision = "ym"))

cotton <- read.csv("cotton-prices-historical-chart-data.csv")%>%
mutate(date = ymd(date))%>% 
mutate(month = format_ISO8601(date, precision = "ym"))

5 Exploring Sugar

# Looking at the price of sugar over the last 30 years
ggplot(sugar, aes(date, price)) +
  geom_line() +
  labs(title = "Sugar Prices") + xlab("year") + ylab("prices")

#Summary of sugar
summary(sugar)
##      price             date               month          
##  Min.   :0.0438   Min.   :1993-01-04   Length:7581       
##  1st Qu.:0.0971   1st Qu.:2000-07-31   Class :character  
##  Median :0.1214   Median :2008-03-12   Mode  :character  
##  Mean   :0.1340   Mean   :2008-02-23                     
##  3rd Qu.:0.1672   3rd Qu.:2015-09-18                     
##  Max.   :0.3531   Max.   :2023-02-17
#Significant fluctuations in price.
  
#Sugar monthly average

sugar_m <- sugar %>% group_by(month) %>% mutate(mon_avg = mean(price))%>%
select(month, mon_avg)

#Drop Duplicate rows Sugar

sugar_m <- sugar_m[!duplicated(sugar_m),]

#Creating Time series data for Sugar

sugar_ts <- ts(sugar_m$mon_avg,start=c(1993),frequency=12)

#Plotting

chartSeries(sugar_ts)

#Looking at trends

autoplot(decompose((sugar_ts)),main="") 

#Price spikes occur between 2010-2012 and again in the 2020's

#Sugar Looking for daily or weekly trends

sugar_r <- sugar %>%
filter(date >= as.Date('2022-11-01') & date <= as.Date('2023-01-31'))

ggplot(sugar_r, aes(date, price)) +
  geom_line() +
  labs(title = "Sugar Prices Daily") + xlab("time") + ylab("prices")

# There does not appear to be trends by day of the week. 


#Sugar Prices Controlling for inflation

#write.csv(sugar_m, file="sugar_m.csv",row.names = FALSE)

sugar_CPI <- read.csv("Sugar_cpi.csv")

sugar_adj <- left_join(sugar_m, sugar_CPI, by = c("month" = "date"))

sugar_adj$adj_price <- sugar_adj$mon_avg / sugar_adj$Value

sugar_ts_adj <- ts(sugar_adj$adj_price, start = c(1993), frequency = 12)


ggplot(sugar, aes(date, price)) +
  geom_line() +
  labs(title = "Sugar Prices") + xlab("year") + ylab("prices")

 chartSeries(sugar_ts)

ggplot(data = sugar_adj, aes(x = month, y = adj_price, group = 1)) +
  geom_line() +
  labs(x = "Month", y = "Adjusted Sugar Price", title = "Sugar Prices Adjusted for Inflation")

ggplot(sugar_adj, aes(x = month)) +
  geom_line(aes(y = Value, color = "Inflation Rate", group = 1)) +
  geom_line(aes(y = mon_avg, color = "Sugar Price", group = 1)) +
  scale_color_manual(values = c("blue", "red")) +
  xlab("Month") +
  ylab("Value") +
  ggtitle("Inflation Rate and Sugar Prices over Time")

sugar_adj$norm_value <- scale(sugar_adj$Value)
sugar_adj$norm_mon_avg <- scale(sugar_adj$mon_avg)

# Calculate the adjusted sugar price by dividing the monthly average sugar price by the CPI value


# Plot the normalized values for 'Value' and 'mon_avg' and the adjusted sugar price as lines

#changing month to a date
sugar_adj$month <- as.Date(paste0(sugar_adj$month, "-01"))

ggplot(sugar_adj, aes(x = month)) +
  geom_line(aes(y = norm_value, color = "CPI", group = 1)) +
  geom_line(aes(y = norm_mon_avg, color = "Sugar Price", group = 1)) +
  scale_color_manual(values = c("darkblue", "orange")) +
  xlab("Date") +
  ylab("Normalized Values") +
  ggtitle("Inflation Rate vs. Price - Sugar") +
  scale_x_date(date_labels = "%Y")

6 Arima Sugar

##Stationary Test
adf.test(sugar_ts, alternative = "stationary")
## 
##  Augmented Dickey-Fuller Test
## 
## data:  sugar_ts
## Dickey-Fuller = -2.3631, Lag order = 7, p-value = 0.4236
## alternative hypothesis: stationary
## After first-order differencing
adf.test(diff(sugar_ts), alternative ="stationary")
## Warning in adf.test(diff(sugar_ts), alternative = "stationary"): p-value smaller
## than printed p-value
## 
##  Augmented Dickey-Fuller Test
## 
## data:  diff(sugar_ts)
## Dickey-Fuller = -7.8345, Lag order = 7, p-value = 0.01
## alternative hypothesis: stationary
#This decreases our p-value and creates significance
#Data is now stationary making the value of (d)=1

#Custom ARIMA Model

#Correlation Plot and Tuning selection. This plot looks at price correlation with itself in prior time periods. The parameter selection controls for that. 
#ACF (q)
acf(diff(sugar_ts),main='')

#q=2

#Looking at impact of price prior periods have on consecutive time periods
#PACF (p)
pacf(diff(sugar_ts),main='')

#p=2. There are 2 partial auto corelation values

# ARIMA Custom

sugar_fit<- Arima(sugar_ts, order=c(2,1,2))

BIC(sugar_fit)
## [1] -2226.736
# Calculate the AICc value

loglik_sugar_fit <- logLik(sugar_fit)

loglik_sugar_fit <- logLik(sugar_fit) - max(loglik_sugar_fit)

n <- length(sugar_ts)
k <- length(coef(sugar_fit))
sugar_fit_AICc <- AIC(sugar_fit) + 2*k*(k+1)/(n-k-1) - 2*loglik_sugar_fit

sugar_fit_AICc
## 'log Lik.' -2246.069 (df=5)
sugar_fit
## Series: sugar_ts 
## ARIMA(2,1,2) 
## 
## Coefficients:
##          ar1      ar2      ma1      ma2
##       0.9220  -0.0449  -0.5155  -0.3888
## s.e.  0.1624   0.1843   0.1571   0.1996
## 
## sigma^2 = 0.0001142:  log likelihood = 1128.09
## AIC=-2246.18   AICc=-2246.01   BIC=-2226.74
# ARIMA Alternate Custom

sugar_fit2<- Arima(sugar_ts, order=c(2,1,3))
sugar_fit2
## Series: sugar_ts 
## ARIMA(2,1,3) 
## 
## Coefficients:
##          ar1     ar2     ma1      ma2      ma3
##       0.0427  0.8469  0.3682  -0.9154  -0.4054
## s.e.  0.1680  0.1584  0.1714   0.0975   0.0836
## 
## sigma^2 = 0.0001144:  log likelihood = 1128.25
## AIC=-2244.51   AICc=-2244.27   BIC=-2221.17
forecast::accuracy(sugar_fit2)
##                        ME       RMSE         MAE         MPE    MAPE      MASE
## Training set 0.0005395214 0.01060471 0.007528664 0.009326486 5.68556 0.2399881
##                      ACF1
## Training set -0.006954922
#Check residuals

checkresiduals(sugar_fit2)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(2,1,3)
## Q* = 29.583, df = 19, p-value = 0.05735
## 
## Model df: 5.   Total lags used: 24
#Auto-fit Arima

auto_sugar<- auto.arima(sugar_ts)
auto_sugar
## Series: sugar_ts 
## ARIMA(0,1,1)(0,0,2)[12] 
## 
## Coefficients:
##          ma1    sma1    sma2
##       0.4213  0.1814  0.0903
## s.e.  0.0488  0.0524  0.0537
## 
## sigma^2 = 0.0001105:  log likelihood = 1133.31
## AIC=-2258.62   AICc=-2258.51   BIC=-2243.06
##Forecast Plot

##Forecast Custom

autoplot(forecast::forecast(sugar_fit, h=12, level=c(80,95)))

##Forecast Custom 2

autoplot(forecast::forecast(sugar_fit2, h=12, level=c(80,95)))

##Forecast Auto

autoplot(forecast::forecast(auto_sugar, h=12, level=c(80,95)))

#ARIMA using more recent data

sugar_r2 <- sugar %>%
filter(date >= as.Date('2019-01-01') & date <= as.Date('2023-01-31'))

sugar_r2 <- sugar_r2 %>% group_by(month) %>% mutate(mon_avg = mean(price))%>%
select(month, mon_avg)

#Drop Duplicate rows Sugar

sugar_r2 <- sugar_r2[!duplicated(sugar_r2),]

sugar_tsr <- ts(sugar_r2$mon_avg,start=c(2019),frequency=12)

##Stationary Test
adf.test(sugar_tsr, alternative = "stationary")
## 
##  Augmented Dickey-Fuller Test
## 
## data:  sugar_tsr
## Dickey-Fuller = -2.4651, Lag order = 3, p-value = 0.388
## alternative hypothesis: stationary
## After first-order differencing
adf.test(diff(sugar_tsr), alternative ="stationary")
## 
##  Augmented Dickey-Fuller Test
## 
## data:  diff(sugar_tsr)
## Dickey-Fuller = -3.5424, Lag order = 3, p-value = 0.04751
## alternative hypothesis: stationary
#This decreases our p-value and creates significance
#Data is now stationary making the value of (d)=2

#Custom Arima Model

#Correlation Plot and Tuning selection
#ACF (q)
acf(diff(sugar_tsr),main='')

#q=0

#Looking at impact of price prior periods have on consecutive time periods
#PACF (p)
pacf(diff(sugar_tsr),main='')

#p=0. There are 2 partial auto corelation values

# ARIMA Custom

sugar_fitR<- Arima(sugar_tsr, order=c(0,2,0))
sugar_fitR
## Series: sugar_tsr 
## ARIMA(0,2,0) 
## 
## sigma^2 = 0.0001337:  log likelihood = 142.93
## AIC=-283.86   AICc=-283.77   BIC=-282.01
forecast::accuracy(sugar_fitR)
##                         ME       RMSE         MAE        MPE     MAPE      MASE
## Training set -7.134492e-05 0.01132439 0.008501457 0.02670897 6.032758 0.3216336
##                    ACF1
## Training set -0.2904199
#Check residuals

checkresiduals(sugar_fitR)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,2,0)
## Q* = 23.23, df = 10, p-value = 0.00993
## 
## Model df: 0.   Total lags used: 10
#Auto-fit Arima

auto_sugarR<- auto.arima(sugar_tsr)
auto_sugarR
## Series: sugar_tsr 
## ARIMA(0,1,1)(0,0,1)[12] 
## 
## Coefficients:
##          ma1    sma1
##       0.3711  0.5497
## s.e.  0.1442  0.2408
## 
## sigma^2 = 6.251e-05:  log likelihood = 163.02
## AIC=-320.04   AICc=-319.49   BIC=-314.42
##Forecast Plot

autoplot(forecast::forecast(auto_sugarR, h=12, level=c(80,95)))

##Forecast Custom

autoplot(forecast::forecast(sugar_fitR, h=12, level=c(80,95)))

#Printing Predictions

sugar_predictions <- forecast::forecast(auto_sugarR,h=12)

print(sugar_predictions$mean)
##            Jan       Feb       Mar       Apr       May       Jun       Jul
## 2023           0.1980707 0.2017292 0.2008702 0.1972369 0.1956776 0.1919627
## 2024 0.2038595                                                            
##            Aug       Sep       Oct       Nov       Dec
## 2023 0.1876254 0.1872578 0.1898181 0.1962718 0.2008475
## 2024

7 Sugar GARCH

# Model Creation

garch_model <- ugarchspec(variance.model = list(model = "sGARCH", 
                                                garchOrder = c(1,1)), 
                                                mean.model = list(armaOrder =
                                                c(0,0), include.mean = TRUE), distribution.model = "std")
# Fitting Model to Data

sugar_garch <- ugarchfit(spec = garch_model, data = sugar_ts)

sugar_vol <-ts(sugar_garch@fit$sigma^2,start=c(1993),frequency=12)


print(sugar_garch)
## 
## *---------------------------------*
## *          GARCH Model Fit        *
## *---------------------------------*
## 
## Conditional Variance Dynamics    
## -----------------------------------
## GARCH Model  : sGARCH(1,1)
## Mean Model   : ARFIMA(0,0,0)
## Distribution : std 
## 
## Optimal Parameters
## ------------------------------------
##          Estimate  Std. Error   t value Pr(>|t|)
## mu       0.116391    0.001092 106.62385 0.000000
## omega    0.000051    0.000012   4.22398 0.000024
## alpha1   0.977824    0.094878  10.30608 0.000000
## beta1    0.021175    0.043050   0.49188 0.622804
## shape   99.999556   70.368010   1.42109 0.155289
## 
## Robust Standard Errors:
##          Estimate  Std. Error  t value Pr(>|t|)
## mu       0.116391    0.002521 46.16956 0.000000
## omega    0.000051    0.000015  3.34398 0.000826
## alpha1   0.977824    0.052135 18.75546 0.000000
## beta1    0.021175    0.038591  0.54871 0.583201
## shape   99.999556   10.484237  9.53809 0.000000
## 
## LogLikelihood : 756.3023 
## 
## Information Criteria
## ------------------------------------
##                     
## Akaike       -4.1508
## Bayes        -4.0971
## Shibata      -4.1512
## Hannan-Quinn -4.1295
## 
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
##                         statistic p-value
## Lag[1]                      248.9       0
## Lag[2*(p+q)+(p+q)-1][2]     345.9       0
## Lag[4*(p+q)+(p+q)-1][5]     597.8       0
## d.o.f=0
## H0 : No serial correlation
## 
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
##                         statistic p-value
## Lag[1]                      4.891 0.02700
## Lag[2*(p+q)+(p+q)-1][5]     7.625 0.03648
## Lag[4*(p+q)+(p+q)-1][9]     8.266 0.11428
## d.o.f=2
## 
## Weighted ARCH LM Tests
## ------------------------------------
##             Statistic Shape Scale P-Value
## ARCH Lag[3]    0.2124 0.500 2.000  0.6449
## ARCH Lag[5]    1.0171 1.440 1.667  0.7279
## ARCH Lag[7]    1.0949 2.315 1.543  0.8976
## 
## Nyblom stability test
## ------------------------------------
## Joint Statistic:  10.7911
## Individual Statistics:             
## mu     4.1788
## omega  0.4136
## alpha1 0.2436
## beta1  0.6172
## shape  8.4986
## 
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic:          1.28 1.47 1.88
## Individual Statistic:     0.35 0.47 0.75
## 
## Sign Bias Test
## ------------------------------------
##                    t-value   prob sig
## Sign Bias           1.4952 0.1357    
## Negative Sign Bias  0.5157 0.6064    
## Positive Sign Bias  0.6158 0.5384    
## Joint Effect        2.7621 0.4298    
## 
## 
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
##   group statistic p-value(g-1)
## 1    20     325.8    9.816e-58
## 2    30     358.8    1.509e-58
## 3    40     356.7    6.333e-53
## 4    50     390.5    9.785e-55
## 
## 
## Elapsed time : 0.175035
#plot(sugar_garch, which = 1)

coef(sugar_garch)
##           mu        omega       alpha1        beta1        shape 
## 1.163906e-01 5.093451e-05 9.778237e-01 2.117538e-02 9.999956e+01
# Forecasting

horizon <- 3

sugar_forecast_garch <- ugarchforecast(sugar_garch, n.ahead = horizon)

forecast_mean_sugar <- as.numeric(sugar_forecast_garch@forecast$seriesFor)
actual_values_sugar <- as.numeric(window(sugar_vol, start = c(1993, 1)))


#plot(sugar_forecast_garch, n.plot = horizon, n.col = 1, plot.type = "single", 
    # main = "GARCH Forecast for Sugar Prices", ylab = "Price", xlab = "Time") #%>%
#lines(sugar_ts[(length(sugar_ts)-horizon+1):length(sugar_ts)], col = "blue")


#GARCH Take 2
#Use ARIMA values from acf, pacf. Did this in Sugar_fit
sugar_fit
## Series: sugar_ts 
## ARIMA(2,1,2) 
## 
## Coefficients:
##          ar1      ar2      ma1      ma2
##       0.9220  -0.0449  -0.5155  -0.3888
## s.e.  0.1624   0.1843   0.1571   0.1996
## 
## sigma^2 = 0.0001142:  log likelihood = 1128.09
## AIC=-2246.18   AICc=-2246.01   BIC=-2226.74
sugar_fit2
## Series: sugar_ts 
## ARIMA(2,1,3) 
## 
## Coefficients:
##          ar1     ar2     ma1      ma2      ma3
##       0.0427  0.8469  0.3682  -0.9154  -0.4054
## s.e.  0.1680  0.1584  0.1714   0.0975   0.0836
## 
## sigma^2 = 0.0001144:  log likelihood = 1128.25
## AIC=-2244.51   AICc=-2244.27   BIC=-2221.17
sugar_fitR
## Series: sugar_tsr 
## ARIMA(0,2,0) 
## 
## sigma^2 = 0.0001337:  log likelihood = 142.93
## AIC=-283.86   AICc=-283.77   BIC=-282.01
auto_sugar
## Series: sugar_ts 
## ARIMA(0,1,1)(0,0,2)[12] 
## 
## Coefficients:
##          ma1    sma1    sma2
##       0.4213  0.1814  0.0903
## s.e.  0.0488  0.0524  0.0537
## 
## sigma^2 = 0.0001105:  log likelihood = 1133.31
## AIC=-2258.62   AICc=-2258.51   BIC=-2243.06
auto_sugarR
## Series: sugar_tsr 
## ARIMA(0,1,1)(0,0,1)[12] 
## 
## Coefficients:
##          ma1    sma1
##       0.3711  0.5497
## s.e.  0.1442  0.2408
## 
## sigma^2 = 6.251e-05:  log likelihood = 163.02
## AIC=-320.04   AICc=-319.49   BIC=-314.42
#Based on significance. Let's try auto_arimas

garch_model2 <- ugarchspec(variance.model = list(model = "sGARCH", 
                                                garchOrder = c(1,1)), 
                                                mean.model = list(armaOrder =
                                                c(1,0), include.mean = TRUE), distribution.model = "std")

sugar_garch2 <- ugarchfit(spec = garch_model, data = sugar_ts)
sugar_garch2
## 
## *---------------------------------*
## *          GARCH Model Fit        *
## *---------------------------------*
## 
## Conditional Variance Dynamics    
## -----------------------------------
## GARCH Model  : sGARCH(1,1)
## Mean Model   : ARFIMA(0,0,0)
## Distribution : std 
## 
## Optimal Parameters
## ------------------------------------
##          Estimate  Std. Error   t value Pr(>|t|)
## mu       0.116391    0.001092 106.62385 0.000000
## omega    0.000051    0.000012   4.22398 0.000024
## alpha1   0.977824    0.094878  10.30608 0.000000
## beta1    0.021175    0.043050   0.49188 0.622804
## shape   99.999556   70.368010   1.42109 0.155289
## 
## Robust Standard Errors:
##          Estimate  Std. Error  t value Pr(>|t|)
## mu       0.116391    0.002521 46.16956 0.000000
## omega    0.000051    0.000015  3.34398 0.000826
## alpha1   0.977824    0.052135 18.75546 0.000000
## beta1    0.021175    0.038591  0.54871 0.583201
## shape   99.999556   10.484237  9.53809 0.000000
## 
## LogLikelihood : 756.3023 
## 
## Information Criteria
## ------------------------------------
##                     
## Akaike       -4.1508
## Bayes        -4.0971
## Shibata      -4.1512
## Hannan-Quinn -4.1295
## 
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
##                         statistic p-value
## Lag[1]                      248.9       0
## Lag[2*(p+q)+(p+q)-1][2]     345.9       0
## Lag[4*(p+q)+(p+q)-1][5]     597.8       0
## d.o.f=0
## H0 : No serial correlation
## 
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
##                         statistic p-value
## Lag[1]                      4.891 0.02700
## Lag[2*(p+q)+(p+q)-1][5]     7.625 0.03648
## Lag[4*(p+q)+(p+q)-1][9]     8.266 0.11428
## d.o.f=2
## 
## Weighted ARCH LM Tests
## ------------------------------------
##             Statistic Shape Scale P-Value
## ARCH Lag[3]    0.2124 0.500 2.000  0.6449
## ARCH Lag[5]    1.0171 1.440 1.667  0.7279
## ARCH Lag[7]    1.0949 2.315 1.543  0.8976
## 
## Nyblom stability test
## ------------------------------------
## Joint Statistic:  10.7911
## Individual Statistics:             
## mu     4.1788
## omega  0.4136
## alpha1 0.2436
## beta1  0.6172
## shape  8.4986
## 
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic:          1.28 1.47 1.88
## Individual Statistic:     0.35 0.47 0.75
## 
## Sign Bias Test
## ------------------------------------
##                    t-value   prob sig
## Sign Bias           1.4952 0.1357    
## Negative Sign Bias  0.5157 0.6064    
## Positive Sign Bias  0.6158 0.5384    
## Joint Effect        2.7621 0.4298    
## 
## 
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
##   group statistic p-value(g-1)
## 1    20     325.8    9.816e-58
## 2    30     358.8    1.509e-58
## 3    40     356.7    6.333e-53
## 4    50     390.5    9.785e-55
## 
## 
## Elapsed time : 0.1337819
#Time Series based on volatility or Variance based on a standard Garch [1,1] model

sugar_vol <-ts(sugar_garch2@fit$sigma^2,start=c(1993),frequency=12)

plot(sugar_vol,xlab="",ylab="",main="Sugar_Volatility (GARCH[1,1])")

#Exponential GARCH (does not work quite as well)
Egarch_model <- ugarchspec(variance.model = list(model = "eGARCH", 
                                                garchOrder = c(1,1)), 
                                                mean.model = list(armaOrder =
                                                c(1,0), include.mean = TRUE), distribution.model = "std")

sugar_egarch2 <- ugarchfit(spec = garch_model, data = sugar_ts)
sugar_egarch2
## 
## *---------------------------------*
## *          GARCH Model Fit        *
## *---------------------------------*
## 
## Conditional Variance Dynamics    
## -----------------------------------
## GARCH Model  : sGARCH(1,1)
## Mean Model   : ARFIMA(0,0,0)
## Distribution : std 
## 
## Optimal Parameters
## ------------------------------------
##          Estimate  Std. Error   t value Pr(>|t|)
## mu       0.116391    0.001092 106.62385 0.000000
## omega    0.000051    0.000012   4.22398 0.000024
## alpha1   0.977824    0.094878  10.30608 0.000000
## beta1    0.021175    0.043050   0.49188 0.622804
## shape   99.999556   70.368010   1.42109 0.155289
## 
## Robust Standard Errors:
##          Estimate  Std. Error  t value Pr(>|t|)
## mu       0.116391    0.002521 46.16956 0.000000
## omega    0.000051    0.000015  3.34398 0.000826
## alpha1   0.977824    0.052135 18.75546 0.000000
## beta1    0.021175    0.038591  0.54871 0.583201
## shape   99.999556   10.484237  9.53809 0.000000
## 
## LogLikelihood : 756.3023 
## 
## Information Criteria
## ------------------------------------
##                     
## Akaike       -4.1508
## Bayes        -4.0971
## Shibata      -4.1512
## Hannan-Quinn -4.1295
## 
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
##                         statistic p-value
## Lag[1]                      248.9       0
## Lag[2*(p+q)+(p+q)-1][2]     345.9       0
## Lag[4*(p+q)+(p+q)-1][5]     597.8       0
## d.o.f=0
## H0 : No serial correlation
## 
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
##                         statistic p-value
## Lag[1]                      4.891 0.02700
## Lag[2*(p+q)+(p+q)-1][5]     7.625 0.03648
## Lag[4*(p+q)+(p+q)-1][9]     8.266 0.11428
## d.o.f=2
## 
## Weighted ARCH LM Tests
## ------------------------------------
##             Statistic Shape Scale P-Value
## ARCH Lag[3]    0.2124 0.500 2.000  0.6449
## ARCH Lag[5]    1.0171 1.440 1.667  0.7279
## ARCH Lag[7]    1.0949 2.315 1.543  0.8976
## 
## Nyblom stability test
## ------------------------------------
## Joint Statistic:  10.7911
## Individual Statistics:             
## mu     4.1788
## omega  0.4136
## alpha1 0.2436
## beta1  0.6172
## shape  8.4986
## 
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic:          1.28 1.47 1.88
## Individual Statistic:     0.35 0.47 0.75
## 
## Sign Bias Test
## ------------------------------------
##                    t-value   prob sig
## Sign Bias           1.4952 0.1357    
## Negative Sign Bias  0.5157 0.6064    
## Positive Sign Bias  0.6158 0.5384    
## Joint Effect        2.7621 0.4298    
## 
## 
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
##   group statistic p-value(g-1)
## 1    20     325.8    9.816e-58
## 2    30     358.8    1.509e-58
## 3    40     356.7    6.333e-53
## 4    50     390.5    9.785e-55
## 
## 
## Elapsed time : 0.1366649
coef(sugar_egarch2)
##           mu        omega       alpha1        beta1        shape 
## 1.163906e-01 5.093451e-05 9.778237e-01 2.117538e-02 9.999956e+01
sugar_forecast_garche <- ugarchforecast(sugar_egarch2, n.ahead = horizon)

forecast_mean_sugare <- as.numeric(sugar_forecast_garche@forecast$seriesFor)
actual_values_sugare <- as.numeric(window(sugar_vol, start = c(1993, 1)))

#Time Series based on volatility or Variance based on a standard Garch [1,1] model

esugar_vol <-ts(sugar_egarch2@fit$sigma^2,start=c(1993),frequency=12)

#plot(esugar_vol,xlab="",ylab="",main="Sugar_Volatility (eGARCH[1,1])")

cor(sugar_vol, esugar_vol)
## [1] 1
ts.plot(sugar_vol,esugar_vol,col=c("green","red"),xlab="")
legend("topright",legend=c("Standard","Exponential"),col=c("green","red"),lty=c(1,1))

#No difference shown

#GARCH 3

names(sugar_garch2@model)
##  [1] "modelinc"   "modeldesc"  "modeldata"  "pars"       "start.pars"
##  [6] "fixed.pars" "maxOrder"   "pos.matrix" "fmodel"     "pidx"      
## [11] "n.start"
names(sugar_garch2@fit)
##  [1] "hessian"         "cvar"            "var"             "sigma"          
##  [5] "condH"           "z"               "LLH"             "log.likelihoods"
##  [9] "residuals"       "coef"            "robust.cvar"     "A"              
## [13] "B"               "scores"          "se.coef"         "tval"           
## [17] "matcoef"         "robust.se.coef"  "robust.tval"     "robust.matcoef" 
## [21] "fitted.values"   "convergence"     "kappa"           "persistence"    
## [25] "timer"           "ipars"           "solver"
#Variance
sugar_garch_var <- sugar_garch2@fit$var
#Residuals
sugar_garch_res <- (sugar_garch2@fit$residuals)^2

#Plotting residuals and conditional variances
plot(sugar_garch_res, type = "l")
lines(sugar_garch_var, col = "green")

sugar_forecast_garch2 <- ugarchforecast(sugar_garch2, n.ahead = 12)
sugar_forecast_garch2
## 
## *------------------------------------*
## *       GARCH Model Forecast         *
## *------------------------------------*
## Model: sGARCH
## Horizon: 12
## Roll Steps: 0
## Out of Sample: 0
## 
## 0-roll forecast [T0=Feb 2023]:
##      Series   Sigma
## T+1  0.1164 0.09664
## T+2  0.1164 0.09685
## T+3  0.1164 0.09707
## T+4  0.1164 0.09728
## T+5  0.1164 0.09749
## T+6  0.1164 0.09770
## T+7  0.1164 0.09792
## T+8  0.1164 0.09813
## T+9  0.1164 0.09834
## T+10 0.1164 0.09855
## T+11 0.1164 0.09876
## T+12 0.1164 0.09896
sugar_forecast_values <- as.numeric(sugar_forecast_garch2@forecast$series)
print(sugar_forecast_values)
##  [1] 0.1163906 0.1163906 0.1163906 0.1163906 0.1163906 0.1163906 0.1163906
##  [8] 0.1163906 0.1163906 0.1163906 0.1163906 0.1163906
sugar_forecast_index <- sugar_forecast_garch2@forecast$seriesFor
sugar_forecast <- data.frame(time = sugar_forecast_index, forecast = sugar_forecast_values)
print(sugar_forecast)
##       Feb.2023  forecast
## T+1  0.1163906 0.1163906
## T+2  0.1163906 0.1163906
## T+3  0.1163906 0.1163906
## T+4  0.1163906 0.1163906
## T+5  0.1163906 0.1163906
## T+6  0.1163906 0.1163906
## T+7  0.1163906 0.1163906
## T+8  0.1163906 0.1163906
## T+9  0.1163906 0.1163906
## T+10 0.1163906 0.1163906
## T+11 0.1163906 0.1163906
## T+12 0.1163906 0.1163906
#write.csv(sugar_forecast, file="sugar_forecast.csv",row.names = FALSE)

sugar_garch2_fitted <- fitted(sugar_garch2)
#print(sugar_garch2_fitted)


#print(sugar_forecast_values)

#summary(sugar_forecast_garch2)

ug_sugar <- sugar_forecast_garch2@forecast$sigmaFor
#plot(ug_sugar, type = "l")

sug_var_t <- c(tail(sugar_garch_var,20),rep(NA,10))  # gets the last 20 observations
sug_res_t <- c(tail(sugar_garch_res,20),rep(NA,10))  # gets the last 20 observations
sug_f <- c(rep(NA,20),(ug_sugar)^2)

#plot(sug_res_t, type = "l") #Residuals
lines(sug_f, col = "orange") # Predictions 
lines(sug_var_t, col = "green") #Conditional Forecast

#Plot Predictions

sug_mean_forecast <- as.numeric(sugar_forecast_garch2@forecast$seriesFor)

# Get the upper and lower confidence intervals for both 95% and 80%
sug_conf_int_95 <- as.numeric(sugar_forecast_garch2@forecast$upper[, "95%"]) # 95% confidence interval
sug_conf_int_80 <- as.numeric(sugar_forecast_garch2@forecast$upper[, "80%"]) # 80% confidence interval

# Plot the mean forecasted values with the two confidence intervals

#plot(sugar_forecast_garch2, main = "Forecasted coffee Prices (GARCH(1,1))")

8 Sugar Model Comparison

#ARIMA Models
forecast::accuracy(sugar_fit)
##                        ME       RMSE         MAE        MPE     MAPE      MASE
## Training set 0.0003955322 0.01061327 0.007561439 -0.0316936 5.710755 0.2410328
##                      ACF1
## Training set -0.001346786
forecast::accuracy(sugar_fit2)
##                        ME       RMSE         MAE         MPE    MAPE      MASE
## Training set 0.0005395214 0.01060471 0.007528664 0.009326486 5.68556 0.2399881
##                      ACF1
## Training set -0.006954922
forecast::accuracy(auto_sugar)
##                        ME      RMSE         MAE        MPE     MAPE      MASE
## Training set 0.0002209403 0.0104548 0.007462305 0.04185269 5.656067 0.2378728
##                      ACF1
## Training set -0.005221658
forecast::accuracy(sugar_fitR)
##                         ME       RMSE         MAE        MPE     MAPE      MASE
## Training set -7.134492e-05 0.01132439 0.008501457 0.02670897 6.032758 0.3216336
##                    ACF1
## Training set -0.2904199
forecast::accuracy(auto_sugarR)
##                        ME        RMSE         MAE       MPE    MAPE      MASE
## Training set 0.0007288199 0.007660754 0.005699677 0.3909611 3.96685 0.2156345
##                     ACF1
## Training set -0.04453725
AIC(sugar_fit2)
## [1] -2244.505
BIC(sugar_fit2)
## [1] -2221.172
#GARCH Model

actual_values_sugar <- as.numeric(window(sugar_ts))
actual_values_sugar <- head(actual_values_sugar, length(forecast_mean_sugar))

mae <- mean(abs(forecast_mean_sugar - actual_values_sugar))
mse <- mean((forecast_mean_sugar - actual_values_sugar)^2)
rmse <- sqrt(mse)
# Print the results
cat(paste("MAE: ", mae, "\n"))
## MAE:  0.0221829354226844
cat(paste("MSE: ", mse, "\n"))
## MSE:  0.000625796954725992
cat(paste("RMSE: ", rmse, "\n"))
## RMSE:  0.0250159340166621
#GARCH Model 2

forecast_mean_sugar2 <- as.numeric(sugar_forecast_garch2@forecast$seriesFor)
actual_values_sugar2 <- as.numeric(window(sugar_ts))
actual_values_sugar2 <- head(actual_values_sugar2, length(forecast_mean_sugar2))


sug_mae <- mean(abs(forecast_mean_sugar2 - actual_values_sugar2))
sug_mse <- mean((forecast_mean_sugar2 - actual_values_sugar2)^2)
sug_rmse <- sqrt(sug_mse)
# Print the results
cat(paste("MAE: ", sug_mae, "\n"))
## MAE:  0.014777997147131
cat(paste("MSE: ", sug_mse, "\n"))
## MSE:  0.000302609589889141
cat(paste("RMSE: ", sug_rmse, "\n"))
## RMSE:  0.0173956773334395
#eGARCH

forecast_mean_sugare <- as.numeric(sugar_forecast_garche@forecast$seriesFor)
actual_values_sugare <- as.numeric(window(sugar_ts))
actual_values_sugare <- head(actual_values_sugar2, length(forecast_mean_sugare))


esug_mae <- mean(abs(forecast_mean_sugare - actual_values_sugare))
esug_mse <- mean((forecast_mean_sugare - actual_values_sugare)^2)
esug_rmse <- sqrt(sug_mse)
# Print the results
cat(paste("MAE: ", esug_mae, "\n"))
## MAE:  0.0221829354226844
cat(paste("MSE: ", esug_mse, "\n"))
## MSE:  0.000625796954725992
cat(paste("RMSE: ", esug_rmse, "\n"))
## RMSE:  0.0173956773334395

9 Exploring Coffee

# Looking at the price of Coffee over the last 30 years
ggplot(coffee, aes(date, value)) +
  geom_line() +
  labs(title = "Revenue by Year")

#Summary of coffee
summary(coffee)
##       date                value           month          
##  Min.   :1993-01-04   Min.   :0.4250   Length:7582       
##  1st Qu.:2000-07-31   1st Qu.:0.9995   Class :character  
##  Median :2008-03-12   Median :1.2070   Mode  :character  
##  Mean   :2008-02-23   Mean   :1.2816                     
##  3rd Qu.:2015-09-17   3rd Qu.:1.5115                     
##  Max.   :2023-02-17   Max.   :3.1480
##Significant changes in price

#Coffee monthly average

coffee_m <- coffee %>% group_by(month) %>% mutate(mon_avg = mean(value))%>%
select(month, mon_avg)

#Drop Duplicate rows

coffee_m <- coffee_m[!duplicated(coffee_m),]

#Creating Time series data

coffee_ts <- ts(coffee_m$mon_avg,start=c(1993),frequency=12)

#Plotting

chartSeries(coffee_ts)

## Significant increases in price in the mid 90's, 2010-2012 and the 2020's

#Looking at trends

autoplot(decompose((coffee_ts)),main="") 

#Coffee looking for daily or weekly trends
coffee_r <- coffee %>%
filter(date >= as.Date('2022-11-01') & date <= as.Date('2023-01-31'))

ggplot(coffee_r, aes(date, value)) +
  geom_line() +
  labs(title = "Coffee Prices Daily") + xlab("time") + ylab("prices")

## No trends by day of week seen

10 Coffee Arima

##Stationary Test
adf.test(coffee_ts, alternative = "stationary")
## 
##  Augmented Dickey-Fuller Test
## 
## data:  coffee_ts
## Dickey-Fuller = -3.2651, Lag order = 7, p-value = 0.07714
## alternative hypothesis: stationary
## After first-order differencing
adf.test(diff(coffee_ts), alternative ="stationary")
## Warning in adf.test(diff(coffee_ts), alternative = "stationary"): p-value
## smaller than printed p-value
## 
##  Augmented Dickey-Fuller Test
## 
## data:  diff(coffee_ts)
## Dickey-Fuller = -5.9334, Lag order = 7, p-value = 0.01
## alternative hypothesis: stationary
#This decreases our p-value and creates significance
#Data is now stationary making the value of (d)=1

#Custom Arima Model

#Correlation Plot and Tuning selection
#ACF (q)
acf(diff(coffee_ts),main='')

#q=4

#Looking at impact of price prior periods have on consecutive time periods
#PACF (p)
pacf(diff(coffee_ts),main='')

#p=4. There are 4 partial auto corelation values

# ARIMA Custom (best fit)

coffee_fit<- Arima(coffee_ts, order=c(4,1,4))
coffee_fit
## Series: coffee_ts 
## ARIMA(4,1,4) 
## 
## Coefficients:
##          ar1     ar2      ar3     ar4      ma1      ma2     ma3      ma4
##       0.5954  0.0770  -0.9527  0.5519  -0.4654  -0.0217  0.9312  -0.4830
## s.e.  0.2993  0.0552   0.0632  0.2614   0.3128   0.0566  0.0562   0.2851
## 
## sigma^2 = 0.01271:  log likelihood = 279.38
## AIC=-540.75   AICc=-540.24   BIC=-505.75
# ARIMA Alternate Custom

coffee_fit2<- Arima(coffee_ts, order=c(4,1,5))
coffee_fit2
## Series: coffee_ts 
## ARIMA(4,1,5) 
## 
## Coefficients:
##           ar1     ar2     ar3     ar4     ma1      ma2      ma3      ma4
##       -0.1247  0.6060  0.2163  0.1729  0.2508  -0.4745  -0.2219  -0.3044
## s.e.   0.2759  0.2615  0.2585  0.2339  0.2706   0.2508   0.2373   0.2231
##           ma5
##       -0.2263
## s.e.   0.0604
## 
## sigma^2 = 0.01249:  log likelihood = 282.85
## AIC=-545.69   AICc=-545.07   BIC=-506.81
#Check residuals

checkresiduals(coffee_fit)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(4,1,4)
## Q* = 18.839, df = 16, p-value = 0.2771
## 
## Model df: 8.   Total lags used: 24
#Auto-fit Arima

auto_coffee<- auto.arima(coffee_ts)
auto_coffee
## Series: coffee_ts 
## ARIMA(0,1,3) 
## 
## Coefficients:
##          ma1     ma2     ma3
##       0.1286  0.1538  0.0843
## s.e.  0.0521  0.0550  0.0520
## 
## sigma^2 = 0.01278:  log likelihood = 276.15
## AIC=-544.3   AICc=-544.19   BIC=-528.74
##Forecast Plots

##Forecast Custom

autoplot(forecast::forecast(coffee_fit, h=12, level=c(80,95)))

##Forecast Custom 2

autoplot(forecast::forecast(coffee_fit2, h=12, level=c(80,95)))

##Forecast Auto

autoplot(forecast::forecast(auto_coffee, h=12, level=c(80,95)))

#ARIMA using more recent data

coffee_r2 <- coffee %>%
filter(date >= as.Date('2010-01-01') & date <= as.Date('2023-01-31'))

coffee_r2 <- coffee_r2 %>% group_by(month) %>% mutate(mon_avg = mean(value))%>%
select(month, mon_avg)

#Drop Duplicate rows Coffee

coffee_r2 <- coffee_r2[!duplicated(coffee_r2),]

coffee_tsr <- ts(coffee_r2$mon_avg,start=c(2010),frequency=12)

##Stationary Test
adf.test(coffee_tsr, alternative = "stationary")
## 
##  Augmented Dickey-Fuller Test
## 
## data:  coffee_tsr
## Dickey-Fuller = -2.949, Lag order = 5, p-value = 0.1807
## alternative hypothesis: stationary
## After first-order differencing
adf.test(diff(coffee_tsr), alternative ="stationary")
## 
##  Augmented Dickey-Fuller Test
## 
## data:  diff(coffee_tsr)
## Dickey-Fuller = -3.3631, Lag order = 5, p-value = 0.06326
## alternative hypothesis: stationary
#This decreases our p-value and creates significance
#Data is now stationary making the value of (d)=1

#Custom Arima Model

#Correlation Plot and Tuning selection
#ACF (q)
acf(diff(coffee_tsr),main='')

#q=6

#Looking at impact of price prior periods have on consecutive time periods
#PACF (p)
pacf(diff(coffee_tsr),main='')

#p=2. There are 2 partial auto correlation values

# ARIMA Custom

coffee_fitR<- Arima(coffee_tsr, order=c(2,1,6))
coffee_fitR
## Series: coffee_tsr 
## ARIMA(2,1,6) 
## 
## Coefficients:
##           ar1     ar2     ma1      ma2      ma3    ma4     ma5     ma6
##       -0.1766  0.5327  0.4315  -0.4325  -0.1015  0.058  0.1799  0.3457
## s.e.   0.1415  0.1311  0.1478   0.1425   0.0938  0.084  0.1008  0.0898
## 
## sigma^2 = 0.01069:  log likelihood = 136.07
## AIC=-254.13   AICc=-252.9   BIC=-226.68
#Check residuals

checkresiduals(coffee_fitR)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(2,1,6)
## Q* = 18.687, df = 16, p-value = 0.2853
## 
## Model df: 8.   Total lags used: 24
##Forecast Custom

autoplot(forecast::forecast(coffee_fitR, h=12, level=c(80,95)))

11 Coffee GARCH

# Model Creation
garch_model <- ugarchspec(variance.model = list(model = "sGARCH", 
                                                garchOrder = c(1,1)), 
                                                mean.model = list(armaOrder =
                                                c(0,0), include.mean = TRUE), distribution.model = "std")
# Fitting Model to Data

coffee_garch <- ugarchfit(spec = garch_model, data = coffee_ts)

coffee_vol <-ts(coffee_garch@fit$sigma^2,start=c(1993),frequency=12)


print(coffee_garch)
## 
## *---------------------------------*
## *          GARCH Model Fit        *
## *---------------------------------*
## 
## Conditional Variance Dynamics    
## -----------------------------------
## GARCH Model  : sGARCH(1,1)
## Mean Model   : ARFIMA(0,0,0)
## Distribution : std 
## 
## Optimal Parameters
## ------------------------------------
##         Estimate  Std. Error  t value Pr(>|t|)
## mu      1.163668    0.012917  90.0876 0.000000
## omega   0.006320    0.001610   3.9246 0.000087
## alpha1  0.963469    0.103053   9.3493 0.000000
## beta1   0.035531    0.051711   0.6871 0.492023
## shape  41.707308   39.423928   1.0579 0.290093
## 
## Robust Standard Errors:
##         Estimate  Std. Error  t value Pr(>|t|)
## mu      1.163668    0.033799 34.42924 0.000000
## omega   0.006320    0.002121  2.97960 0.002886
## alpha1  0.963469    0.079086 12.18254 0.000000
## beta1   0.035531    0.049899  0.71205 0.476434
## shape  41.707308   54.714844  0.76227 0.445901
## 
## LogLikelihood : -47.06522 
## 
## Information Criteria
## ------------------------------------
##                     
## Akaike       0.28765
## Bayes        0.34141
## Shibata      0.28728
## Hannan-Quinn 0.30902
## 
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
##                         statistic p-value
## Lag[1]                      216.3       0
## Lag[2*(p+q)+(p+q)-1][2]     295.2       0
## Lag[4*(p+q)+(p+q)-1][5]     490.1       0
## d.o.f=0
## H0 : No serial correlation
## 
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
##                         statistic p-value
## Lag[1]                      4.550 0.03291
## Lag[2*(p+q)+(p+q)-1][5]     4.743 0.17488
## Lag[4*(p+q)+(p+q)-1][9]     4.993 0.43046
## d.o.f=2
## 
## Weighted ARCH LM Tests
## ------------------------------------
##             Statistic Shape Scale P-Value
## ARCH Lag[3]   0.04656 0.500 2.000  0.8292
## ARCH Lag[5]   0.38595 1.440 1.667  0.9166
## ARCH Lag[7]   0.51618 2.315 1.543  0.9769
## 
## Nyblom stability test
## ------------------------------------
## Joint Statistic:  2.9057
## Individual Statistics:              
## mu     2.03276
## omega  0.03716
## alpha1 0.45032
## beta1  0.20222
## shape  0.12263
## 
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic:          1.28 1.47 1.88
## Individual Statistic:     0.35 0.47 0.75
## 
## Sign Bias Test
## ------------------------------------
##                    t-value   prob sig
## Sign Bias           1.3570 0.1756    
## Negative Sign Bias  0.2075 0.8357    
## Positive Sign Bias  0.8410 0.4009    
## Joint Effect        2.4352 0.4871    
## 
## 
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
##   group statistic p-value(g-1)
## 1    20     277.3    8.600e-48
## 2    30     326.2    5.166e-52
## 3    40     323.3    1.835e-46
## 4    50     338.8    5.870e-45
## 
## 
## Elapsed time : 0.1964259
#plot(coffee_garch, which = 1)

coef(coffee_garch)
##           mu        omega       alpha1        beta1        shape 
##  1.163667853  0.006319659  0.963469397  0.035530613 41.707307511
# Forecasting

horizon <- 3

coffee_forecast_garch <- ugarchforecast(coffee_garch, n.ahead = horizon)

forecast_mean_coffee <- as.numeric(coffee_forecast_garch@forecast$seriesFor)
actual_values_coffee <- as.numeric(window(coffee_vol, start = c(1993, 1)))


#plot(coffee_forecast_garch, n.plot = horizon, n.col = 1, plot.type = "single", 
    # main = "GARCH Forecast for coffee Prices", ylab = "Price", xlab = "Time") #%>%
#lines(coffee_ts[(length(coffee_ts)-horizon+1):length(coffee_ts)], col = "blue")


#Based on signficance. Let's try auto_arimas

garch_model <- ugarchspec(variance.model = list(model = "sGARCH", 
                                                garchOrder = c(1,1)), 
                                                mean.model = list(armaOrder =
                                                c(1,0), include.mean = TRUE), distribution.model = "std")

coffee_garch2 <- ugarchfit(spec = garch_model, data = coffee_ts)
coffee_garch2
## 
## *---------------------------------*
## *          GARCH Model Fit        *
## *---------------------------------*
## 
## Conditional Variance Dynamics    
## -----------------------------------
## GARCH Model  : sGARCH(1,1)
## Mean Model   : ARFIMA(1,0,0)
## Distribution : std 
## 
## Optimal Parameters
## ------------------------------------
##         Estimate  Std. Error  t value Pr(>|t|)
## mu      0.685232    0.069468   9.8640 0.000000
## ar1     0.979045    0.008883 110.2205 0.000000
## omega   0.001102    0.000596   1.8498 0.064345
## alpha1  0.313564    0.100026   3.1348 0.001720
## beta1   0.652411    0.098098   6.6506 0.000000
## shape   4.793099    1.230126   3.8964 0.000098
## 
## Robust Standard Errors:
##         Estimate  Std. Error  t value Pr(>|t|)
## mu      0.685232    0.030481  22.4807 0.000000
## ar1     0.979045    0.012618  77.5882 0.000000
## omega   0.001102    0.000740   1.4897 0.136295
## alpha1  0.313564    0.112132   2.7964 0.005168
## beta1   0.652411    0.110367   5.9113 0.000000
## shape   4.793099    1.329729   3.6046 0.000313
## 
## LogLikelihood : 336.509 
## 
## Information Criteria
## ------------------------------------
##                     
## Akaike       -1.8260
## Bayes        -1.7615
## Shibata      -1.8266
## Hannan-Quinn -1.8004
## 
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
##                         statistic   p-value
## Lag[1]                      6.856 8.833e-03
## Lag[2*(p+q)+(p+q)-1][2]     9.614 3.734e-09
## Lag[4*(p+q)+(p+q)-1][5]    12.488 8.611e-05
## d.o.f=1
## H0 : No serial correlation
## 
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
##                         statistic p-value
## Lag[1]                     0.0524 0.81894
## Lag[2*(p+q)+(p+q)-1][5]    7.1030 0.04896
## Lag[4*(p+q)+(p+q)-1][9]    8.9404 0.08360
## d.o.f=2
## 
## Weighted ARCH LM Tests
## ------------------------------------
##             Statistic Shape Scale P-Value
## ARCH Lag[3]    0.1976 0.500 2.000  0.6566
## ARCH Lag[5]    1.0086 1.440 1.667  0.7304
## ARCH Lag[7]    1.7858 2.315 1.543  0.7625
## 
## Nyblom stability test
## ------------------------------------
## Joint Statistic:  1.1403
## Individual Statistics:             
## mu     0.4216
## ar1    0.1803
## omega  0.1394
## alpha1 0.1449
## beta1  0.1563
## shape  0.3658
## 
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic:          1.49 1.68 2.12
## Individual Statistic:     0.35 0.47 0.75
## 
## Sign Bias Test
## ------------------------------------
##                    t-value    prob sig
## Sign Bias          1.50505 0.13320    
## Negative Sign Bias 0.62289 0.53375    
## Positive Sign Bias 0.02833 0.97741    
## Joint Effect       7.06926 0.06972   *
## 
## 
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
##   group statistic p-value(g-1)
## 1    20     29.27      0.06184
## 2    30     35.46      0.18988
## 3    40     38.22      0.50522
## 4    50     46.29      0.58376
## 
## 
## Elapsed time : 0.105289
print(coffee_garch2)
## 
## *---------------------------------*
## *          GARCH Model Fit        *
## *---------------------------------*
## 
## Conditional Variance Dynamics    
## -----------------------------------
## GARCH Model  : sGARCH(1,1)
## Mean Model   : ARFIMA(1,0,0)
## Distribution : std 
## 
## Optimal Parameters
## ------------------------------------
##         Estimate  Std. Error  t value Pr(>|t|)
## mu      0.685232    0.069468   9.8640 0.000000
## ar1     0.979045    0.008883 110.2205 0.000000
## omega   0.001102    0.000596   1.8498 0.064345
## alpha1  0.313564    0.100026   3.1348 0.001720
## beta1   0.652411    0.098098   6.6506 0.000000
## shape   4.793099    1.230126   3.8964 0.000098
## 
## Robust Standard Errors:
##         Estimate  Std. Error  t value Pr(>|t|)
## mu      0.685232    0.030481  22.4807 0.000000
## ar1     0.979045    0.012618  77.5882 0.000000
## omega   0.001102    0.000740   1.4897 0.136295
## alpha1  0.313564    0.112132   2.7964 0.005168
## beta1   0.652411    0.110367   5.9113 0.000000
## shape   4.793099    1.329729   3.6046 0.000313
## 
## LogLikelihood : 336.509 
## 
## Information Criteria
## ------------------------------------
##                     
## Akaike       -1.8260
## Bayes        -1.7615
## Shibata      -1.8266
## Hannan-Quinn -1.8004
## 
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
##                         statistic   p-value
## Lag[1]                      6.856 8.833e-03
## Lag[2*(p+q)+(p+q)-1][2]     9.614 3.734e-09
## Lag[4*(p+q)+(p+q)-1][5]    12.488 8.611e-05
## d.o.f=1
## H0 : No serial correlation
## 
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
##                         statistic p-value
## Lag[1]                     0.0524 0.81894
## Lag[2*(p+q)+(p+q)-1][5]    7.1030 0.04896
## Lag[4*(p+q)+(p+q)-1][9]    8.9404 0.08360
## d.o.f=2
## 
## Weighted ARCH LM Tests
## ------------------------------------
##             Statistic Shape Scale P-Value
## ARCH Lag[3]    0.1976 0.500 2.000  0.6566
## ARCH Lag[5]    1.0086 1.440 1.667  0.7304
## ARCH Lag[7]    1.7858 2.315 1.543  0.7625
## 
## Nyblom stability test
## ------------------------------------
## Joint Statistic:  1.1403
## Individual Statistics:             
## mu     0.4216
## ar1    0.1803
## omega  0.1394
## alpha1 0.1449
## beta1  0.1563
## shape  0.3658
## 
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic:          1.49 1.68 2.12
## Individual Statistic:     0.35 0.47 0.75
## 
## Sign Bias Test
## ------------------------------------
##                    t-value    prob sig
## Sign Bias          1.50505 0.13320    
## Negative Sign Bias 0.62289 0.53375    
## Positive Sign Bias 0.02833 0.97741    
## Joint Effect       7.06926 0.06972   *
## 
## 
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
##   group statistic p-value(g-1)
## 1    20     29.27      0.06184
## 2    30     35.46      0.18988
## 3    40     38.22      0.50522
## 4    50     46.29      0.58376
## 
## 
## Elapsed time : 0.105289
#Time Series based on volatility or Variance based on a standard Garch [1,1] model

coffee_vol <-ts(coffee_garch2@fit$sigma^2,start=c(1993),frequency=12)

plot(coffee_vol,xlab="",ylab="",main="coffee_Volatility (GARCH[1,1])")


#GARCH 3

names(coffee_garch2@model)
##  [1] "modelinc"   "modeldesc"  "modeldata"  "pars"       "start.pars"
##  [6] "fixed.pars" "maxOrder"   "pos.matrix" "fmodel"     "pidx"      
## [11] "n.start"
names(coffee_garch2@fit)
##  [1] "hessian"         "cvar"            "var"             "sigma"          
##  [5] "condH"           "z"               "LLH"             "log.likelihoods"
##  [9] "residuals"       "coef"            "robust.cvar"     "A"              
## [13] "B"               "scores"          "se.coef"         "tval"           
## [17] "matcoef"         "robust.se.coef"  "robust.tval"     "robust.matcoef" 
## [21] "fitted.values"   "convergence"     "kappa"           "persistence"    
## [25] "timer"           "ipars"           "solver"
#Variance
coffee_garch_var <- coffee_garch2@fit$var
#Residuals
coffee_garch_res <- (coffee_garch2@fit$residuals)^2

#Plotting residuals and conditional variances
#plot(coffee_garch_res, type = "l")
lines(coffee_garch_var, col = "green")

coffee_forecast_garch2 <- ugarchforecast(coffee_garch2, n.ahead = 12)
coffee_forecast_garch2
## 
## *------------------------------------*
## *       GARCH Model Forecast         *
## *------------------------------------*
## Model: sGARCH
## Horizon: 12
## Roll Steps: 0
## Out of Sample: 0
## 
## 0-roll forecast [T0=Feb 2023]:
##      Series  Sigma
## T+1   1.753 0.1716
## T+2   1.731 0.1719
## T+3   1.709 0.1721
## T+4   1.687 0.1724
## T+5   1.666 0.1727
## T+6   1.646 0.1729
## T+7   1.626 0.1732
## T+8   1.606 0.1734
## T+9   1.587 0.1736
## T+10  1.568 0.1739
## T+11  1.549 0.1741
## T+12  1.531 0.1743
ug_coffee <- coffee_forecast_garch2@forecast$sigmaFor
plot(ug_coffee, type = "l")

coffee_var_t <- c(tail(coffee_garch_var,20),rep(NA,10))  # gets the last 20 observations
coffee_res_t <- c(tail(coffee_garch_res,20),rep(NA,10))  # gets the last 20 observations
coffee_f <- c(rep(NA,20),(ug_coffee)^2)

plot(coffee_res_t, type = "l") #Residuals
lines(coffee_f, col = "orange") # Predictions 
lines(coffee_var_t, col = "green") #Conditional Forecast

#Plot Predictions
#plot(coffee_forecast_garch2, main = "Forecasted coffee Prices (GARCH(1,1))")
#legend("bottomright", legend = c("Mean", "Lower 95% CI", "Upper 95% CI"), col = c("black", "blue", "red"), lty = 1)

coffee_mean_forecast <- as.numeric(coffee_forecast_garch2@forecast$seriesFor)

# Plot the mean forecasted values with the two confidence intervals

#plot(coffee_forecast_garch2, main = "Forecasted coffee Prices (GARCH(1,1))")

12 Coffee Model Comparison

#ARIMA Models
forecast::accuracy(coffee_fit)
##                       ME      RMSE        MAE          MPE     MAPE      MASE
## Training set 0.002422129 0.1113441 0.07793565 -0.008238448 6.054242 0.2064315
##                      ACF1
## Training set -0.007994279
forecast::accuracy(coffee_fit2)
##                      ME      RMSE        MAE        MPE     MAPE      MASE
## Training set 0.00694441 0.1102114 0.07716969 0.03490727 5.963348 0.2044027
##                      ACF1
## Training set -0.003319408
forecast::accuracy(auto_coffee)
##                       ME      RMSE        MAE           MPE     MAPE      MASE
## Training set 0.002408695 0.1124349 0.07772975 -0.0001296227 6.015002 0.2058861
##                       ACF1
## Training set -0.0005311462
forecast::accuracy(coffee_fitR)
##                         ME      RMSE        MAE          MPE     MAPE      MASE
## Training set -0.0001687003 0.1003911 0.07550881 -0.001675586 4.964701 0.1660969
##                      ACF1
## Training set 0.0005172509
AIC(coffee_fit)
## [1] -540.751
BIC(coffee_fit)
## [1] -505.7511
AIC(coffee_fit2)
## [1] -545.6944
BIC(coffee_fit2)
## [1] -506.8056
AIC(coffee_fitR)
## [1] -254.1334
BIC(coffee_fitR)
## [1] -226.6847
#GARCH Model

actual_values_coffee <- as.numeric(window(coffee_ts))
actual_values_coffee <- head(actual_values_coffee, length(forecast_mean_coffee))

mae <- mean(abs(forecast_mean_coffee - actual_values_coffee))
mse <- mean((forecast_mean_coffee - actual_values_coffee)^2)
rmse <- sqrt(mse)

# Print the results
cat(paste("MAE: ", mae, "\n"))
## MAE:  0.525243672859614
cat(paste("MSE: ", mse, "\n"))
## MSE:  0.27628538402018
cat(paste("RMSE: ", rmse, "\n"))
## RMSE:  0.52562856088704
#GARCH Model 2

forecast_mean_coffee2 <- as.numeric(coffee_forecast_garch2@forecast$seriesFor)
actual_values_coffee2 <- as.numeric(window(coffee_ts))
actual_values_coffee2 <- head(actual_values_coffee2, length(forecast_mean_coffee2))


coffee_mae <- mean(abs(forecast_mean_coffee2 - actual_values_coffee2))
coffee_mse <- mean((forecast_mean_coffee2 - actual_values_coffee2)^2)
coffee_rmse <- sqrt(coffee_mse)
# Print the results
cat(paste("MAE: ", coffee_mae, "\n"))
## MAE:  0.952327512190766
cat(paste("MSE: ", coffee_mse, "\n"))
## MSE:  0.925844918652174
cat(paste("RMSE: ", coffee_rmse, "\n"))
## RMSE:  0.962208355114512

13 Exploring Aluminum

#Looking at the price of Aluminum over the last 30 years
ggplot(Aluminum, aes(date, Price)) +
  geom_line() +
  labs(title = "Revenue by Day")

#Summary Aluminum
summary(Aluminum)
##      Price           date               month          
##  Min.   :   0   Min.   :2002-10-23   Length:4953       
##  1st Qu.:1770   1st Qu.:2008-05-30   Class :character  
##  Median :1971   Median :2013-04-24   Mode  :character  
##  Mean   :2071   Mean   :2013-04-10                     
##  3rd Qu.:2390   3rd Qu.:2018-03-15                     
##  Max.   :3849   Max.   :2023-02-16
## Low price is less than half of highest cost.
  
#Aluminum monthly average

alum_m <- Aluminum %>% group_by(month) %>% mutate(mon_avg = mean(Price))%>%
select(month, mon_avg)

#Drop Duplicate rows Aluminum

alum_m <- alum_m[!duplicated(alum_m),]

#Creating Time series data Aluminum

alum_ts <- ts(alum_m$mon_avg,start=c(2002),frequency=12)

#Plotting Aluminum

chartSeries(alum_ts)

#Huge spike in price in early 2000's unique among commodities

#Looking at trends
autoplot(decompose((alum_ts)),main="") 

#Looking for daily or weekly trends Aluminum
Aluminum_r <- Aluminum %>%
filter(date >= as.Date('2022-12-31') & date <= as.Date('2023-01-31'))

ggplot(Aluminum_r, aes(date, Price)) +
  geom_line() +
  labs(title = "Aluminum Prices Daily") + xlab("time") + ylab("prices")

## No weekly trends shown through plotting


#Aluminum Prices Controlling for inflation


alum_CPI <- read.csv("alum_cpi.csv")

alum_adj <- left_join(alum_m, alum_CPI, by = c("month" = "date"))

alum_adj$adj_price <- alum_adj$mon_avg / alum_adj$Value

alum_ts_adj <- ts(alum_adj$adj_price, start = c(2003), frequency = 12)


 chartSeries(alum_ts)

ggplot(data = alum_adj, aes(x = month, y = adj_price, group = 1)) +
  geom_line() +
  labs(x = "Month", y = "Adjusted alum Price", title = "alum Prices Adjusted for Inflation")
## Warning: Removed 3 rows containing missing values (`geom_line()`).

#Inflation vs. Price without scaling
ggplot(alum_adj, aes(x = month)) +
  geom_line(aes(y = Value, color = "Inflation Rate", group = 1)) +
  geom_line(aes(y = mon_avg, color = "alum Price", group = 1)) +
  scale_color_manual(values = c("blue", "red")) +
  xlab("Month") +
  ylab("Value") +
  ggtitle("Inflation Rate and alum Prices over Time")
## Warning: Removed 3 rows containing missing values (`geom_line()`).

# Scaling the data
alum_adj$norm_value <- scale(alum_adj$Value)
alum_adj$norm_mon_avg <- scale(alum_adj$mon_avg)

# Calculate the adjusted alum price by dividing the monthly average alum price by the CPI value


# Plot the normalized values for 'Value' and 'mon_avg' and the adjusted alum price as lines

alum_adj$month <- as.Date(paste0(alum_adj$month, "-01"))

ggplot(alum_adj, aes(x = month)) +
  geom_line(aes(y = norm_value, color = "CPI", group = 1)) +
  geom_line(aes(y = norm_mon_avg, color = "alum Price", group = 1)) +
  scale_color_manual(values = c("darkblue", "orange")) +
  xlab("Date") +
  ylab("Normalized Values") +
  ggtitle("Inflation Rate vs. Price - Aluminum") +
  scale_x_date(date_labels = "%Y")
## Warning: Removed 3 rows containing missing values (`geom_line()`).

14 Aluminum ARIMA

##Stationary Test
adf.test(alum_ts, alternative = "stationary")
## 
##  Augmented Dickey-Fuller Test
## 
## data:  alum_ts
## Dickey-Fuller = -2.6022, Lag order = 6, p-value = 0.3224
## alternative hypothesis: stationary
## After first-order differencing
adf.test(diff(alum_ts), alternative ="stationary")
## Warning in adf.test(diff(alum_ts), alternative = "stationary"): p-value smaller
## than printed p-value
## 
##  Augmented Dickey-Fuller Test
## 
## data:  diff(alum_ts)
## Dickey-Fuller = -6.0687, Lag order = 6, p-value = 0.01
## alternative hypothesis: stationary
#This decreases our p-value and creates significance
#Data is now stationary making the value of (d)=1

#Custom Arima Model

#Correlation Plot and Tuning selection
#ACF (q)
acf(diff(alum_ts),main='')

#q=3

#Looking at impact of price prior periods have on consecutive time periods
#PACF (p)
pacf(diff(alum_ts),main='')

#p=3. There are 3 partial auto correlation values

# ARIMA Custom (best fit)

aluminum_fit<- Arima(alum_ts, order=c(3,1,3))
aluminum_fit
## Series: alum_ts 
## ARIMA(3,1,3) 
## 
## Coefficients:
##          ar1      ar2     ar3      ma1     ma2      ma3
##       1.1325  -1.2485  0.3465  -0.8396  1.0051  -0.0065
## s.e.  0.1714   0.1397  0.1696   0.1818  0.1572   0.1835
## 
## sigma^2 = 11595:  log likelihood = -1487.02
## AIC=2988.03   AICc=2988.51   BIC=3012.51
# ARIMA Alternate Custom

aluminum_fit2<- Arima(alum_ts, order=c(3,1,2))
aluminum_fit2
## Series: alum_ts 
## ARIMA(3,1,2) 
## 
## Coefficients:
##           ar1     ar2      ar3     ma1      ma2
##       -0.3140  0.4679  -0.0601  0.6432  -0.3134
## s.e.   1.1728  0.7328   0.3159  1.1674   1.1311
## 
## sigma^2 = 11831:  log likelihood = -1488.07
## AIC=2988.13   AICc=2988.49   BIC=3009.11
#Check residuals

checkresiduals(aluminum_fit)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(3,1,3)
## Q* = 23.776, df = 18, p-value = 0.1625
## 
## Model df: 6.   Total lags used: 24
#Auto-fit Arima

auto_aluminum<- auto.arima(alum_ts)
auto_aluminum
## Series: alum_ts 
## ARIMA(2,0,1)(2,0,1)[12] with non-zero mean 
## 
## Coefficients:
##          ar1      ar2      ma1     sar1     sar2    sma1       mean
##       1.3678  -0.4051  -0.0960  -0.1301  -0.1949  0.0914  2039.9543
## s.e.  0.2242   0.2184   0.2492   0.3281   0.0694  0.3354   128.7307
## 
## sigma^2 = 11389:  log likelihood = -1490.23
## AIC=2996.45   AICc=2997.06   BIC=3024.46
##Forecast Plots

##Forecast Custom

autoplot(forecast::forecast(aluminum_fit, h=12, level=c(80,95)))

##Forecast Custom 2

autoplot(forecast::forecast(aluminum_fit2, h=12, level=c(80,95)))

##Forecast Auto

autoplot(forecast::forecast(auto_aluminum, h=12, level=c(80,95)))

#ARIMA using more recent data

aluminum_r2 <- Aluminum %>%
filter(date >= as.Date('2019-01-01') & date <= as.Date('2023-01-31'))

aluminum_r2 <- aluminum_r2 %>% group_by(month) %>% mutate(mon_avg = mean(Price))%>%
select(month, mon_avg)

#Drop Duplicate rows Aluminum

aluminum_r2 <- aluminum_r2[!duplicated(aluminum_r2),]

alum_tsr <- ts(aluminum_r2$mon_avg,start=c(2019),frequency=12)

##Stationary Test
adf.test(alum_tsr, alternative = "stationary")
## 
##  Augmented Dickey-Fuller Test
## 
## data:  alum_tsr
## Dickey-Fuller = -2.2624, Lag order = 3, p-value = 0.4692
## alternative hypothesis: stationary
## After first-order differencing
adf.test(diff(alum_tsr), alternative ="stationary")
## 
##  Augmented Dickey-Fuller Test
## 
## data:  diff(alum_tsr)
## Dickey-Fuller = -3.464, Lag order = 3, p-value = 0.0574
## alternative hypothesis: stationary
#This decreases our p-value and creates significance
#Data is now stationary making the value of (d)=1

#Custom Arima Model

#Correlation Plot and Tuning selection
#ACF (q)
acf(diff(alum_tsr),main='')

#q=1

#Looking at impact of price prior periods have on consecutive time periods
#PACF (p)
pacf(diff(alum_tsr),main='')

#p=2. There are 2 partial auto correlation values

# ARIMA Custom

aluminum_fitR<- Arima(alum_tsr, order=c(2,1,1))
aluminum_fitR
## Series: alum_tsr 
## ARIMA(2,1,1) 
## 
## Coefficients:
##           ar1     ar2     ma1
##       -0.2423  0.1572  0.8567
## s.e.   0.1952  0.1712  0.1305
## 
## sigma^2 = 15008:  log likelihood = -297.69
## AIC=603.39   AICc=604.32   BIC=610.87
#Check residuals

checkresiduals(aluminum_fitR)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(2,1,1)
## Q* = 11.527, df = 7, p-value = 0.1172
## 
## Model df: 3.   Total lags used: 10
##Forecast Custom

autoplot(forecast::forecast(aluminum_fitR, h=12, level=c(80,95)))

#Printing Predictions

aluminum_predictions <- forecast::forecast(aluminum_fitR,h=12)

print(aluminum_predictions$mean)
##           Jan      Feb      Mar      Apr      May      Jun      Jul      Aug
## 2023          1888.621 1880.203 1885.581 1882.954 1884.437 1883.664 1884.085
## 2024 1883.933                                                               
##           Sep      Oct      Nov      Dec
## 2023 1883.861 1883.982 1883.917 1883.952
## 2024

15 Aluminum GARCH

# Model Creation
garch_model <- ugarchspec(variance.model = list(model = "sGARCH", 
                                                garchOrder = c(1,1)), 
                                                mean.model = list(armaOrder =
                                                c(0,0), include.mean = TRUE), distribution.model = "std")
# Fitting Model to Data



alum_fit <- ugarchfit(spec = garch_model, data = alum_ts)

print(alum_fit)
## 
## *---------------------------------*
## *          GARCH Model Fit        *
## *---------------------------------*
## 
## Conditional Variance Dynamics    
## -----------------------------------
## GARCH Model  : sGARCH(1,1)
## Mean Model   : ARFIMA(0,0,0)
## Distribution : std 
## 
## Optimal Parameters
## ------------------------------------
##          Estimate  Std. Error  t value Pr(>|t|)
## mu     1.8323e+03  1.4411e+01 127.1517 0.000000
## omega  2.6125e+03  1.0188e+03   2.5644 0.010336
## alpha1 9.3815e-01  1.1299e-01   8.3027 0.000000
## beta1  6.0847e-02  5.9512e-02   1.0224 0.306580
## shape  1.0000e+02  7.5167e+01   1.3304 0.183393
## 
## Robust Standard Errors:
##          Estimate  Std. Error  t value Pr(>|t|)
## mu     1.8323e+03  3.9258e+01  46.6743 0.000000
## omega  2.6125e+03  1.1162e+03   2.3406 0.019254
## alpha1 9.3815e-01  7.6338e-02  12.2894 0.000000
## beta1  6.0847e-02  8.1280e-02   0.7486 0.454095
## shape  1.0000e+02  5.8292e+00  17.1549 0.000000
## 
## LogLikelihood : -1715.166 
## 
## Information Criteria
## ------------------------------------
##                    
## Akaike       14.042
## Bayes        14.114
## Shibata      14.041
## Hannan-Quinn 14.071
## 
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
##                         statistic p-value
## Lag[1]                      169.4       0
## Lag[2*(p+q)+(p+q)-1][2]     234.1       0
## Lag[4*(p+q)+(p+q)-1][5]     381.1       0
## d.o.f=0
## H0 : No serial correlation
## 
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
##                         statistic   p-value
## Lag[1]                      18.16 2.032e-05
## Lag[2*(p+q)+(p+q)-1][5]     18.92 3.828e-05
## Lag[4*(p+q)+(p+q)-1][9]     20.28 1.702e-04
## d.o.f=2
## 
## Weighted ARCH LM Tests
## ------------------------------------
##             Statistic Shape Scale P-Value
## ARCH Lag[3]   0.05877 0.500 2.000  0.8085
## ARCH Lag[5]   1.01577 1.440 1.667  0.7283
## ARCH Lag[7]   1.80840 2.315 1.543  0.7578
## 
## Nyblom stability test
## ------------------------------------
## Joint Statistic:  29.6025
## Individual Statistics:               
## mu      0.21228
## omega   0.08013
## alpha1  0.84836
## beta1   0.15114
## shape  14.97564
## 
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic:          1.28 1.47 1.88
## Individual Statistic:     0.35 0.47 0.75
## 
## Sign Bias Test
## ------------------------------------
##                    t-value   prob sig
## Sign Bias           1.3182 0.1887    
## Negative Sign Bias  0.2831 0.7773    
## Positive Sign Bias  1.3221 0.1874    
## Joint Effect        2.6093 0.4559    
## 
## 
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
##   group statistic p-value(g-1)
## 1    20     177.4    9.826e-28
## 2    30     193.2    3.444e-26
## 3    40     200.2    1.498e-23
## 4    50     235.2    3.769e-26
## 
## 
## Elapsed time : 0.205163
#plot(alum_fit, which = 1)

coef(alum_fit)
##           mu        omega       alpha1        beta1        shape 
## 1.832340e+03 2.612539e+03 9.381534e-01 6.084659e-02 1.000000e+02
# Forecasting

horizon <- 3

alum_forecast_garch <- ugarchforecast(alum_fit, n.ahead = horizon)

forecast_mean_alum <- as.numeric(alum_forecast_garch@forecast$seriesFor)
actual_values_alum <- as.numeric(window(alum_ts, start = c(1993, 1)))
## Warning in window.default(x, ...): 'start' value not changed
#plot(alum_forecast_garch, n.plot = horizon, n.col = 1, plot.type = "single", 
    # main = "GARCH Forecast for alum Prices", ylab = "Price", xlab = "Time") #%>%
#lines(alum_ts[(length(alum_ts)-horizon+1):length(alum_ts)], col = "blue")


#Based on signficance. Let's try auto_arimas

garch_model <- ugarchspec(variance.model = list(model = "sGARCH", 
                                                garchOrder = c(1,1)), 
                                                mean.model = list(armaOrder =
                                                c(1,0), include.mean = TRUE), distribution.model = "std")

alum_garch2 <- ugarchfit(spec = garch_model, data = alum_ts)
alum_garch2
## 
## *---------------------------------*
## *          GARCH Model Fit        *
## *---------------------------------*
## 
## Conditional Variance Dynamics    
## -----------------------------------
## GARCH Model  : sGARCH(1,1)
## Mean Model   : ARFIMA(1,0,0)
## Distribution : std 
## 
## Optimal Parameters
## ------------------------------------
##          Estimate  Std. Error  t value Pr(>|t|)
## mu     2481.61680  103.331950  24.0160 0.000000
## ar1       1.00000    0.007311 136.7822 0.000000
## omega   485.14727  338.064049   1.4351 0.151266
## alpha1    0.24396    0.077693   3.1401 0.001689
## beta1     0.73411    0.067507  10.8745 0.000000
## shape     8.58310    4.196208   2.0454 0.040811
## 
## Robust Standard Errors:
##          Estimate  Std. Error  t value Pr(>|t|)
## mu     2481.61680   31.379364  79.0844 0.000000
## ar1       1.00000    0.010177  98.2653 0.000000
## omega   485.14727  354.651855   1.3680 0.171327
## alpha1    0.24396    0.066384   3.6750 0.000238
## beta1     0.73411    0.048017  15.2887 0.000000
## shape     8.58310    3.421984   2.5082 0.012134
## 
## LogLikelihood : -1464.623 
## 
## Information Criteria
## ------------------------------------
##                    
## Akaike       12.005
## Bayes        12.091
## Shibata      12.004
## Hannan-Quinn 12.040
## 
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
##                         statistic   p-value
## Lag[1]                      8.385 3.784e-03
## Lag[2*(p+q)+(p+q)-1][2]     9.719 2.831e-09
## Lag[4*(p+q)+(p+q)-1][5]    10.983 4.068e-04
## d.o.f=1
## H0 : No serial correlation
## 
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
##                         statistic p-value
## Lag[1]                  0.0000346  0.9953
## Lag[2*(p+q)+(p+q)-1][5] 0.1087920  0.9978
## Lag[4*(p+q)+(p+q)-1][9] 0.7211680  0.9950
## d.o.f=2
## 
## Weighted ARCH LM Tests
## ------------------------------------
##             Statistic Shape Scale P-Value
## ARCH Lag[3]    0.1206 0.500 2.000  0.7284
## ARCH Lag[5]    0.1928 1.440 1.667  0.9672
## ARCH Lag[7]    0.2565 2.315 1.543  0.9949
## 
## Nyblom stability test
## ------------------------------------
## Joint Statistic:  1.4826
## Individual Statistics:              
## mu     0.70646
## ar1    0.18760
## omega  0.09967
## alpha1 0.10988
## beta1  0.14469
## shape  0.20105
## 
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic:          1.49 1.68 2.12
## Individual Statistic:     0.35 0.47 0.75
## 
## Sign Bias Test
## ------------------------------------
##                    t-value   prob sig
## Sign Bias           0.5388 0.5905    
## Negative Sign Bias  1.4627 0.1449    
## Positive Sign Bias  0.9102 0.3636    
## Joint Effect        4.0548 0.2556    
## 
## 
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
##   group statistic p-value(g-1)
## 1    20     27.57      0.09203
## 2    30     42.14      0.05452
## 3    40     42.51      0.32235
## 4    50     56.43      0.21707
## 
## 
## Elapsed time : 0.1483831
alum_vol <-ts(alum_garch2@fit$sigma^2,start=c(1993),frequency=12)

print(alum_garch2)
## 
## *---------------------------------*
## *          GARCH Model Fit        *
## *---------------------------------*
## 
## Conditional Variance Dynamics    
## -----------------------------------
## GARCH Model  : sGARCH(1,1)
## Mean Model   : ARFIMA(1,0,0)
## Distribution : std 
## 
## Optimal Parameters
## ------------------------------------
##          Estimate  Std. Error  t value Pr(>|t|)
## mu     2481.61680  103.331950  24.0160 0.000000
## ar1       1.00000    0.007311 136.7822 0.000000
## omega   485.14727  338.064049   1.4351 0.151266
## alpha1    0.24396    0.077693   3.1401 0.001689
## beta1     0.73411    0.067507  10.8745 0.000000
## shape     8.58310    4.196208   2.0454 0.040811
## 
## Robust Standard Errors:
##          Estimate  Std. Error  t value Pr(>|t|)
## mu     2481.61680   31.379364  79.0844 0.000000
## ar1       1.00000    0.010177  98.2653 0.000000
## omega   485.14727  354.651855   1.3680 0.171327
## alpha1    0.24396    0.066384   3.6750 0.000238
## beta1     0.73411    0.048017  15.2887 0.000000
## shape     8.58310    3.421984   2.5082 0.012134
## 
## LogLikelihood : -1464.623 
## 
## Information Criteria
## ------------------------------------
##                    
## Akaike       12.005
## Bayes        12.091
## Shibata      12.004
## Hannan-Quinn 12.040
## 
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
##                         statistic   p-value
## Lag[1]                      8.385 3.784e-03
## Lag[2*(p+q)+(p+q)-1][2]     9.719 2.831e-09
## Lag[4*(p+q)+(p+q)-1][5]    10.983 4.068e-04
## d.o.f=1
## H0 : No serial correlation
## 
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
##                         statistic p-value
## Lag[1]                  0.0000346  0.9953
## Lag[2*(p+q)+(p+q)-1][5] 0.1087920  0.9978
## Lag[4*(p+q)+(p+q)-1][9] 0.7211680  0.9950
## d.o.f=2
## 
## Weighted ARCH LM Tests
## ------------------------------------
##             Statistic Shape Scale P-Value
## ARCH Lag[3]    0.1206 0.500 2.000  0.7284
## ARCH Lag[5]    0.1928 1.440 1.667  0.9672
## ARCH Lag[7]    0.2565 2.315 1.543  0.9949
## 
## Nyblom stability test
## ------------------------------------
## Joint Statistic:  1.4826
## Individual Statistics:              
## mu     0.70646
## ar1    0.18760
## omega  0.09967
## alpha1 0.10988
## beta1  0.14469
## shape  0.20105
## 
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic:          1.49 1.68 2.12
## Individual Statistic:     0.35 0.47 0.75
## 
## Sign Bias Test
## ------------------------------------
##                    t-value   prob sig
## Sign Bias           0.5388 0.5905    
## Negative Sign Bias  1.4627 0.1449    
## Positive Sign Bias  0.9102 0.3636    
## Joint Effect        4.0548 0.2556    
## 
## 
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
##   group statistic p-value(g-1)
## 1    20     27.57      0.09203
## 2    30     42.14      0.05452
## 3    40     42.51      0.32235
## 4    50     56.43      0.21707
## 
## 
## Elapsed time : 0.1483831
#Time Series based on volatility or Variance based on a standard Garch [1,1] model

alum_vol <-ts(alum_garch2@fit$sigma^2,start=c(1993),frequency=12)

#plot(alum_vol,xlab="",ylab="",main="alum_Volatility (GARCH[1,1])")


#GARCH 3

names(alum_garch2@model)
##  [1] "modelinc"   "modeldesc"  "modeldata"  "pars"       "start.pars"
##  [6] "fixed.pars" "maxOrder"   "pos.matrix" "fmodel"     "pidx"      
## [11] "n.start"
names(alum_garch2@fit)
##  [1] "hessian"         "cvar"            "var"             "sigma"          
##  [5] "condH"           "z"               "LLH"             "log.likelihoods"
##  [9] "residuals"       "coef"            "robust.cvar"     "A"              
## [13] "B"               "scores"          "se.coef"         "tval"           
## [17] "matcoef"         "robust.se.coef"  "robust.tval"     "robust.matcoef" 
## [21] "fitted.values"   "convergence"     "kappa"           "persistence"    
## [25] "timer"           "ipars"           "solver"
#Variance
alum_garch_var <- alum_garch2@fit$var
#Residuals
alum_garch_res <- (alum_garch2@fit$residuals)^2

#Plotting residuals and conditional variances
plot(alum_garch_res, type = "l")
lines(alum_garch_var, col = "green")

alum_forecast_garch2 <- ugarchforecast(alum_garch2, n.ahead = 12)
alum_forecast_garch2
## 
## *------------------------------------*
## *       GARCH Model Forecast         *
## *------------------------------------*
## Model: sGARCH
## Horizon: 12
## Roll Steps: 0
## Out of Sample: 0
## 
## 0-roll forecast [T0=May 2022]:
##      Series Sigma
## T+1    1338 51.44
## T+2    1338 55.44
## T+3    1338 59.08
## T+4    1338 62.45
## T+5    1338 65.57
## T+6    1338 68.48
## T+7    1338 71.22
## T+8    1338 73.80
## T+9    1338 76.24
## T+10   1338 78.55
## T+11   1338 80.74
## T+12   1338 82.84
ug_alum <- alum_forecast_garch2@forecast$sigmaFor
plot(ug_alum, type = "l")

alum_var_t <- c(tail(alum_garch_var,20),rep(NA,10))  # gets the last 20 observations
alum_res_t <- c(tail(alum_garch_res,20),rep(NA,10))  # gets the last 20 observations
alum_f <- c(rep(NA,20),(ug_alum)^2)

plot(alum_res_t, type = "l") #Residuals
lines(alum_f, col = "orange") # Predictions 
lines(alum_var_t, col = "green") #Conditional Forecast

#Plot Predictions
#plot(alum_forecast_garch2, main = "Forecasted alum Prices (GARCH(1,1))")
#legend("bottomright", legend = c("Mean", "Lower 95% CI", "Upper 95% CI"), col = c("black", "blue", "red"), lty = 1)

alum_mean_forecast <- as.numeric(alum_forecast_garch2@forecast$seriesFor)

16 Model Comparison Alum

#ARIMA Models
forecast::accuracy(aluminum_fit)
##                     ME     RMSE      MAE       MPE     MAPE      MASE
## Training set -3.178964 106.1314 77.14823 -0.225094 3.592564 0.2066838
##                      ACF1
## Training set -0.001764145
forecast::accuracy(aluminum_fit2)
##                     ME     RMSE      MAE        MPE     MAPE      MASE
## Training set -3.223632 107.4307 79.04678 -0.2338294 3.705623 0.2117701
##                      ACF1
## Training set -0.000467848
forecast::accuracy(auto_aluminum)
##                     ME     RMSE      MAE        MPE     MAPE      MASE
## Training set -1.456311 105.1855 78.26729 -0.3282913 3.671699 0.2096818
##                     ACF1
## Training set 0.003850494
forecast::accuracy(aluminum_fitR)
##                     ME     RMSE      MAE        MPE     MAPE      MASE
## Training set -7.505676 117.4017 87.07036 -0.3747184 3.776837 0.1736574
##                    ACF1
## Training set 0.02022298
AIC(aluminum_fit)
## [1] 2988.035
BIC(aluminum_fit)
## [1] 3012.515
AIC(aluminum_fit2)
## [1] 2988.132
BIC(aluminum_fit2)
## [1] 3009.115
AIC(aluminum_fitR)
## [1] 603.3853
BIC(aluminum_fitR)
## [1] 610.8701
#GARCH Model

actual_values_alum <- head(actual_values_alum, length(forecast_mean_alum))

mae <- mean(abs(forecast_mean_alum - actual_values_alum))
mse <- mean((forecast_mean_alum - actual_values_alum)^2)
rmse <- sqrt(mse)

# Print the results
cat(paste("MAE: ", mae, "\n"))
## MAE:  652.061576157405
cat(paste("MSE: ", mse, "\n"))
## MSE:  426606.693369426
cat(paste("RMSE: ", rmse, "\n"))
## RMSE:  653.151355636216
#GARCH Model 2

forecast_mean_alum2 <- as.numeric(alum_forecast_garch2@forecast$seriesFor)
actual_values_alum2 <- as.numeric(window(alum_ts, start = c(1993, 1)))
## Warning in window.default(x, ...): 'start' value not changed
actual_values_alum2 <- head(actual_values_alum2, length(forecast_mean_alum2))


alum_mae <- mean(abs(forecast_mean_alum2 - actual_values_alum2))
alum_mse <- mean((forecast_mean_alum2 - actual_values_alum2)^2)
alum_rmse <- sqrt(alum_mse)
# Print the results
cat(paste("MAE: ", alum_mae, "\n"))
## MAE:  1276.58131835304
cat(paste("MSE: ", alum_mse, "\n"))
## MSE:  1775413.44915285
cat(paste("RMSE: ", alum_rmse, "\n"))
## RMSE:  1332.44641511501

17 Exploring Soybean

# Looking at the price of Soybean over the last 30 years
ggplot(soybean, aes(date, value)) +
  geom_line() +
  labs(title = "Revenue by Day")

#Summary
summary(soybean)
##       date                value           month          
##  Min.   :1993-01-04   Min.   : 4.100   Length:7599       
##  1st Qu.:2000-07-13   1st Qu.: 5.940   Class :character  
##  Median :2008-02-05   Median : 8.645   Mode  :character  
##  Mean   :2008-02-02   Mean   : 8.930                     
##  3rd Qu.:2015-08-20   3rd Qu.:10.707                     
##  Max.   :2023-02-17   Max.   :17.690
## Price has increased significantly over 30 years, outpacing inflation.

#Soybean monthly average

soy_m <- soybean %>% group_by(month) %>% mutate(mon_avg = mean(value))%>%
select(month, mon_avg)

#Drop Duplicate rows Soybean

soy_m <- soy_m[!duplicated(soy_m),]

#Creating Time series data Soybean

soy_ts <- ts(soy_m$mon_avg,start=c(1993),frequency=12)

#Plotting Soybean

chartSeries(soy_ts)

## Price jumps in 2008, 2010 and 2020's.

#Looking at trends

autoplot(decompose((soy_ts)),main="") 

#Looking for daily or weekly trends Soybean
soybean_r <- soybean %>%
filter(date >= as.Date('2022-11-01') & date <= as.Date('2022-12-01'))

ggplot(soybean_r, aes(date, value)) +
  geom_line() +
  labs(title = "Soybean Prices Daily") + xlab("time") + ylab("prices")

## No daily trends detected

18 Soybean ARIMA

##Stationary Test
adf.test(soy_ts, alternative = "stationary")
## 
##  Augmented Dickey-Fuller Test
## 
## data:  soy_ts
## Dickey-Fuller = -2.4842, Lag order = 7, p-value = 0.3725
## alternative hypothesis: stationary
## After first-order differencing
adf.test(diff(soy_ts), alternative ="stationary")
## Warning in adf.test(diff(soy_ts), alternative = "stationary"): p-value smaller
## than printed p-value
## 
##  Augmented Dickey-Fuller Test
## 
## data:  diff(soy_ts)
## Dickey-Fuller = -8.898, Lag order = 7, p-value = 0.01
## alternative hypothesis: stationary
#This decreases our p-value and creates significance
#Data is now stationary making the value of (d)=1

#Custom Arima Model

#Correlation Plot and Tuning selection
#ACF (q)
acf(diff(soy_ts),main='')

#q=5

#Looking at impact of price prior periods have on consecutive time periods
#PACF (p)
pacf(diff(soy_ts),main='')

#p=3. There are 3 partial auto correlation values

# ARIMA Custom(best fit)

soy_fit<- Arima(soy_ts, order=c(3,1,5))
soy_fit
## Series: soy_ts 
## ARIMA(3,1,5) 
## 
## Coefficients:
##          ar1      ar2     ar3      ma1     ma2      ma3      ma4      ma5
##       0.7419  -0.8797  0.6695  -0.3814  0.7881  -0.5498  -0.1477  -0.2369
## s.e.  0.1721   0.0712  0.1508   0.1745  0.0789   0.1561   0.0695   0.0592
## 
## sigma^2 = 0.2638:  log likelihood = -268.02
## AIC=554.05   AICc=554.56   BIC=589.05
#Check residuals

checkresiduals(soy_fit)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(3,1,5)
## Q* = 19.568, df = 16, p-value = 0.2403
## 
## Model df: 8.   Total lags used: 24
#Auto-fit Arima

auto_soy<- auto.arima(soy_ts)
auto_soy
## Series: soy_ts 
## ARIMA(1,1,0) 
## 
## Coefficients:
##          ar1
##       0.3588
## s.e.  0.0490
## 
## sigma^2 = 0.2724:  log likelihood = -277.08
## AIC=558.16   AICc=558.2   BIC=565.94
##Forecast Plots

##Forecast Custom

autoplot(forecast::forecast(soy_fit, h=12, level=c(80,95)))

##Forecast Auto (overfit)

autoplot(forecast::forecast(auto_soy, h=12, level=c(80,95)))

#ARIMA using more recent data

soy_r2 <- soybean %>%
filter(date >= as.Date('2019-01-01') & date <= as.Date('2023-01-31'))

soy_r2 <- soy_r2 %>% group_by(month) %>% mutate(mon_avg = mean(value))%>%
select(month, mon_avg)

#Drop Duplicate rows soy

soy_r2 <- soy_r2[!duplicated(soy_r2),]

soy_tsr <- ts(soy_r2$mon_avg,start=c(2019),frequency=12)

##Stationary Test
adf.test(soy_tsr, alternative = "stationary")
## 
##  Augmented Dickey-Fuller Test
## 
## data:  soy_tsr
## Dickey-Fuller = -3.7175, Lag order = 3, p-value = 0.03277
## alternative hypothesis: stationary
#Low p-value already

## After first-order differencing
adf.test(diff(soy_tsr), alternative ="stationary")
## 
##  Augmented Dickey-Fuller Test
## 
## data:  diff(soy_tsr)
## Dickey-Fuller = -3.0875, Lag order = 3, p-value = 0.1398
## alternative hypothesis: stationary
#This decreases our p-value and creates significance
#d=0

#Custom Arima Model

#Correlation Plot and Tuning selection
#ACF (q)
acf(diff(soy_tsr),main='')

#q=6

#Looking at impact of price prior periods have on consecutive time periods
#PACF (p)
pacf(diff(soy_tsr),main='')

#p=1. There are 2 partial auto correlation values

# ARIMA Custom(better model)

soy_fitR<- Arima(soy_tsr, order=c(1,0,6))
soy_fitR
## Series: soy_tsr 
## ARIMA(1,0,6) with non-zero mean 
## 
## Coefficients:
##         ar1     ma1     ma2     ma3     ma4     ma5     ma6     mean
##       0.853  0.4663  0.5158  0.5678  0.6085  0.2040  0.3551  12.2780
## s.e.  0.087  0.1555  0.1633  0.1643  0.1593  0.1782  0.1849   1.5803
## 
## sigma^2 = 0.3012:  log likelihood = -38.46
## AIC=94.91   AICc=99.53   BIC=111.94
#Check residuals

checkresiduals(soy_fitR)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(1,0,6) with non-zero mean
## Q* = 1.591, df = 3, p-value = 0.6614
## 
## Model df: 7.   Total lags used: 10
##Forecast Custom

autoplot(forecast::forecast(soy_fitR, h=12, level=c(80,95)))

#Printing Predictions

soy_predictions <- forecast::forecast(soy_fitR,h=12)

print(soy_predictions$mean)
##           Jan      Feb      Mar      Apr      May      Jun      Jul      Aug
## 2023          15.62904 15.89738 15.84372 16.14463 15.71070 15.45801 14.99059
## 2024 13.50309                                                               
##           Sep      Oct      Nov      Dec
## 2023 14.59188 14.25177 13.96165 13.71418
## 2024
print(soy_predictions$mean)
##           Jan      Feb      Mar      Apr      May      Jun      Jul      Aug
## 2023          15.62904 15.89738 15.84372 16.14463 15.71070 15.45801 14.99059
## 2024 13.50309                                                               
##           Sep      Oct      Nov      Dec
## 2023 14.59188 14.25177 13.96165 13.71418
## 2024

19 Soy Garch

# Model Creation
garch_model <- ugarchspec(variance.model = list(model = "sGARCH", 
                                                garchOrder = c(1,1)), 
                                                mean.model = list(armaOrder =
                                                c(0,0), include.mean = TRUE), distribution.model = "std")
# Fitting Model to Data

soy_garch <- ugarchfit(spec = garch_model, data = soy_ts)

print(soy_garch)
## 
## *---------------------------------*
## *          GARCH Model Fit        *
## *---------------------------------*
## 
## Conditional Variance Dynamics    
## -----------------------------------
## GARCH Model  : sGARCH(1,1)
## Mean Model   : ARFIMA(0,0,0)
## Distribution : std 
## 
## Optimal Parameters
## ------------------------------------
##          Estimate  Std. Error    t value Pr(>|t|)
## mu       5.784714    0.048936 118.209863 0.000000
## omega    0.057952    0.024057   2.408937 0.015999
## alpha1   0.999000    0.084219  11.861904 0.000000
## beta1    0.000000    0.004047   0.000002 0.999998
## shape   99.999633   65.539550   1.525791 0.127062
## 
## Robust Standard Errors:
##          Estimate  Std. Error   t value Pr(>|t|)
## mu       5.784714    0.106745 54.192013  0.00000
## omega    0.057952    0.052313  1.107794  0.26795
## alpha1   0.999000    0.059852 16.691208  0.00000
## beta1    0.000000    0.004606  0.000002  1.00000
## shape   99.999633   15.751209  6.348696  0.00000
## 
## LogLikelihood : -786.6785 
## 
## Information Criteria
## ------------------------------------
##                    
## Akaike       4.3739
## Bayes        4.4277
## Shibata      4.3735
## Hannan-Quinn 4.3953
## 
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
##                         statistic p-value
## Lag[1]                      245.5       0
## Lag[2*(p+q)+(p+q)-1][2]     340.8       0
## Lag[4*(p+q)+(p+q)-1][5]     598.0       0
## d.o.f=0
## H0 : No serial correlation
## 
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
##                         statistic p-value
## Lag[1]                     0.8803  0.3481
## Lag[2*(p+q)+(p+q)-1][5]    1.7840  0.6703
## Lag[4*(p+q)+(p+q)-1][9]    2.9473  0.7676
## d.o.f=2
## 
## Weighted ARCH LM Tests
## ------------------------------------
##             Statistic Shape Scale P-Value
## ARCH Lag[3]    0.4557 0.500 2.000  0.4996
## ARCH Lag[5]    1.6494 1.440 1.667  0.5539
## ARCH Lag[7]    1.9878 2.315 1.543  0.7199
## 
## Nyblom stability test
## ------------------------------------
## Joint Statistic:  13.5366
## Individual Statistics:             
## mu     0.8402
## omega  0.0176
## alpha1 2.4346
## beta1  1.0869
## shape  3.8682
## 
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic:          1.28 1.47 1.88
## Individual Statistic:     0.35 0.47 0.75
## 
## Sign Bias Test
## ------------------------------------
##                    t-value    prob sig
## Sign Bias           2.2671 0.02398  **
## Negative Sign Bias  0.6849 0.49388    
## Positive Sign Bias  1.9662 0.05005   *
## Joint Effect        7.4019 0.06013   *
## 
## 
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
##   group statistic p-value(g-1)
## 1    20     994.5   8.063e-199
## 2    30    1169.8   9.517e-228
## 3    40    1150.3   7.081e-216
## 4    50    1180.0   6.322e-215
## 
## 
## Elapsed time : 0.1432672
#plot(soy_garch, which = 1)

coef(soy_garch)
##           mu        omega       alpha1        beta1        shape 
## 5.784714e+00 5.795219e-02 9.989997e-01 8.449528e-09 9.999963e+01
# Forecasting

horizon <- 3

soy_forecast_garch <- ugarchforecast(soy_garch, n.ahead = horizon)

forecast_mean_soy <- as.numeric(soy_forecast_garch@forecast$seriesFor)
actual_values_soy <- as.numeric(window(soy_ts, start = c(1993, 1)))


#plot(soy_forecast_garch, n.plot = horizon, n.col = 1, plot.type = "single", 
    # main = "GARCH Forecast for soy Prices", ylab = "Price", xlab = "Time") #%>%
#lines(soy_ts[(length(soy_ts)-horizon+1):length(soy_ts)], col = "blue")


#Based on signficance. Let's try auto_arimas

garch_model <- ugarchspec(variance.model = list(model = "sGARCH", 
                                                garchOrder = c(1,1)), 
                                                mean.model = list(armaOrder =
                                                c(1,0), include.mean = TRUE), distribution.model = "std")

soy_garch2 <- ugarchfit(spec = garch_model, data = soy_ts)
soy_garch2
## 
## *---------------------------------*
## *          GARCH Model Fit        *
## *---------------------------------*
## 
## Conditional Variance Dynamics    
## -----------------------------------
## GARCH Model  : sGARCH(1,1)
## Mean Model   : ARFIMA(1,0,0)
## Distribution : std 
## 
## Optimal Parameters
## ------------------------------------
##         Estimate  Std. Error  t value Pr(>|t|)
## mu      5.795659    0.274167  21.1392 0.000000
## ar1     0.998623    0.006327 157.8332 0.000000
## omega   0.047614    0.017129   2.7797 0.005441
## alpha1  0.539779    0.175683   3.0725 0.002123
## beta1   0.459221    0.107228   4.2827 0.000018
## shape   3.807732    0.792698   4.8035 0.000002
## 
## Robust Standard Errors:
##         Estimate  Std. Error  t value Pr(>|t|)
## mu      5.795659    0.033196 174.5904 0.000000
## ar1     0.998623    0.006172 161.8113 0.000000
## omega   0.047614    0.016961   2.8073 0.004996
## alpha1  0.539779    0.159444   3.3854 0.000711
## beta1   0.459221    0.135799   3.3816 0.000721
## shape   3.807732    0.697483   5.4592 0.000000
## 
## LogLikelihood : -241.1973 
## 
## Information Criteria
## ------------------------------------
##                    
## Akaike       1.3657
## Bayes        1.4302
## Shibata      1.3652
## Hannan-Quinn 1.3914
## 
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
##                         statistic   p-value
## Lag[1]                      23.05 1.580e-06
## Lag[2*(p+q)+(p+q)-1][2]     23.34 0.000e+00
## Lag[4*(p+q)+(p+q)-1][5]     24.86 1.004e-10
## d.o.f=1
## H0 : No serial correlation
## 
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
##                         statistic p-value
## Lag[1]                      1.465  0.2261
## Lag[2*(p+q)+(p+q)-1][5]     2.010  0.6160
## Lag[4*(p+q)+(p+q)-1][9]     4.450  0.5143
## d.o.f=2
## 
## Weighted ARCH LM Tests
## ------------------------------------
##             Statistic Shape Scale P-Value
## ARCH Lag[3]    0.2109 0.500 2.000  0.6461
## ARCH Lag[5]    0.4451 1.440 1.667  0.8997
## ARCH Lag[7]    1.3605 2.315 1.543  0.8489
## 
## Nyblom stability test
## ------------------------------------
## Joint Statistic:  1.4111
## Individual Statistics:              
## mu     0.28744
## ar1    0.03118
## omega  0.30380
## alpha1 0.39278
## beta1  0.25368
## shape  0.69652
## 
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic:          1.49 1.68 2.12
## Individual Statistic:     0.35 0.47 0.75
## 
## Sign Bias Test
## ------------------------------------
##                    t-value   prob sig
## Sign Bias           1.0402 0.2990    
## Negative Sign Bias  0.6755 0.4998    
## Positive Sign Bias  1.3498 0.1779    
## Joint Effect        3.1373 0.3709    
## 
## 
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
##   group statistic p-value(g-1)
## 1    20     25.51      0.14432
## 2    30     45.57      0.02589
## 3    40     48.83      0.13453
## 4    50     64.52      0.06770
## 
## 
## Elapsed time : 0.1329389
print(soy_garch2)
## 
## *---------------------------------*
## *          GARCH Model Fit        *
## *---------------------------------*
## 
## Conditional Variance Dynamics    
## -----------------------------------
## GARCH Model  : sGARCH(1,1)
## Mean Model   : ARFIMA(1,0,0)
## Distribution : std 
## 
## Optimal Parameters
## ------------------------------------
##         Estimate  Std. Error  t value Pr(>|t|)
## mu      5.795659    0.274167  21.1392 0.000000
## ar1     0.998623    0.006327 157.8332 0.000000
## omega   0.047614    0.017129   2.7797 0.005441
## alpha1  0.539779    0.175683   3.0725 0.002123
## beta1   0.459221    0.107228   4.2827 0.000018
## shape   3.807732    0.792698   4.8035 0.000002
## 
## Robust Standard Errors:
##         Estimate  Std. Error  t value Pr(>|t|)
## mu      5.795659    0.033196 174.5904 0.000000
## ar1     0.998623    0.006172 161.8113 0.000000
## omega   0.047614    0.016961   2.8073 0.004996
## alpha1  0.539779    0.159444   3.3854 0.000711
## beta1   0.459221    0.135799   3.3816 0.000721
## shape   3.807732    0.697483   5.4592 0.000000
## 
## LogLikelihood : -241.1973 
## 
## Information Criteria
## ------------------------------------
##                    
## Akaike       1.3657
## Bayes        1.4302
## Shibata      1.3652
## Hannan-Quinn 1.3914
## 
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
##                         statistic   p-value
## Lag[1]                      23.05 1.580e-06
## Lag[2*(p+q)+(p+q)-1][2]     23.34 0.000e+00
## Lag[4*(p+q)+(p+q)-1][5]     24.86 1.004e-10
## d.o.f=1
## H0 : No serial correlation
## 
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
##                         statistic p-value
## Lag[1]                      1.465  0.2261
## Lag[2*(p+q)+(p+q)-1][5]     2.010  0.6160
## Lag[4*(p+q)+(p+q)-1][9]     4.450  0.5143
## d.o.f=2
## 
## Weighted ARCH LM Tests
## ------------------------------------
##             Statistic Shape Scale P-Value
## ARCH Lag[3]    0.2109 0.500 2.000  0.6461
## ARCH Lag[5]    0.4451 1.440 1.667  0.8997
## ARCH Lag[7]    1.3605 2.315 1.543  0.8489
## 
## Nyblom stability test
## ------------------------------------
## Joint Statistic:  1.4111
## Individual Statistics:              
## mu     0.28744
## ar1    0.03118
## omega  0.30380
## alpha1 0.39278
## beta1  0.25368
## shape  0.69652
## 
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic:          1.49 1.68 2.12
## Individual Statistic:     0.35 0.47 0.75
## 
## Sign Bias Test
## ------------------------------------
##                    t-value   prob sig
## Sign Bias           1.0402 0.2990    
## Negative Sign Bias  0.6755 0.4998    
## Positive Sign Bias  1.3498 0.1779    
## Joint Effect        3.1373 0.3709    
## 
## 
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
##   group statistic p-value(g-1)
## 1    20     25.51      0.14432
## 2    30     45.57      0.02589
## 3    40     48.83      0.13453
## 4    50     64.52      0.06770
## 
## 
## Elapsed time : 0.1329389
#Time Series based on volatility or Variance based on a standard Garch [1,1] model

soy_vol <-ts(soy_garch2@fit$sigma^2,start=c(1993),frequency=12)

#plot(soy_vol,xlab="",ylab="",main="soy_Volatility (GARCH[1,1])")


#GARCH 3

names(soy_garch2@model)
##  [1] "modelinc"   "modeldesc"  "modeldata"  "pars"       "start.pars"
##  [6] "fixed.pars" "maxOrder"   "pos.matrix" "fmodel"     "pidx"      
## [11] "n.start"
names(soy_garch2@fit)
##  [1] "hessian"         "cvar"            "var"             "sigma"          
##  [5] "condH"           "z"               "LLH"             "log.likelihoods"
##  [9] "residuals"       "coef"            "robust.cvar"     "A"              
## [13] "B"               "scores"          "se.coef"         "tval"           
## [17] "matcoef"         "robust.se.coef"  "robust.tval"     "robust.matcoef" 
## [21] "fitted.values"   "convergence"     "kappa"           "persistence"    
## [25] "timer"           "ipars"           "solver"
#Variance
soy_garch_var <- soy_garch2@fit$var
#Residuals
soy_garch_res <- (soy_garch2@fit$residuals)^2

#Plotting residuals and conditional variances
plot(soy_garch_res, type = "l")
lines(soy_garch_var, col = "green")

soy_forecast_garch2 <- ugarchforecast(soy_garch2, n.ahead = 12)
soy_forecast_garch2
## 
## *------------------------------------*
## *       GARCH Model Forecast         *
## *------------------------------------*
## Model: sGARCH
## Horizon: 12
## Roll Steps: 0
## Out of Sample: 0
## 
## 0-roll forecast [T0=Feb 2023]:
##      Series  Sigma
## T+1   15.27 0.4542
## T+2   15.25 0.5037
## T+3   15.24 0.5487
## T+4   15.23 0.5902
## T+5   15.21 0.6290
## T+6   15.20 0.6655
## T+7   15.19 0.7000
## T+8   15.18 0.7329
## T+9   15.16 0.7644
## T+10  15.15 0.7945
## T+11  15.14 0.8236
## T+12  15.12 0.8516
ug_soy <- soy_forecast_garch2@forecast$sigmaFor
#plot(ug_soy, type = "l")

soy_var_t <- c(tail(soy_garch_var,20),rep(NA,10))  # gets the last 20 observations
soy_res_t <- c(tail(soy_garch_res,20),rep(NA,10))  # gets the last 20 observations
soy_f <- c(rep(NA,20),(ug_soy)^2)

plot(soy_res_t, type = "l") #Residuals
lines(soy_f, col = "orange") # Predictions 
lines(soy_var_t, col = "green") #Conditional Forecast
legend("topright", 
       legend = c("Residuals", "Predictions", "Conditional Forecast"), 
       col = c("black", "orange", "green"), 
       lty = c(1, 1, 1), 
       cex = 0.8)

#Plot Predictions
#plot(soy_forecast_garch2, main = "Forecasted soy Prices (GARCH(1,1))")
#legend("bottomright", legend = c("Mean", "Lower 95% CI", "Upper 95% CI"), col = c("black", "blue", "red"), lty = 1)

soy_mean_forecast <- as.numeric(soy_forecast_garch2@forecast$seriesFor)


# Plot the mean forecasted values with the two confidence intervals

#plot(soy_forecast_garch2, main = "Forecasted soy Prices (GARCH(1,1))")

20 Soybean Model Comparison

#ARIMA Models
forecast::accuracy(soy_fit)
##                      ME      RMSE       MAE       MPE     MAPE      MASE
## Training set 0.02571002 0.5071781 0.3576524 0.1286753 3.985675 0.2222804
##                     ACF1
## Training set -0.00724677
forecast::accuracy(auto_soy)
##                      ME      RMSE       MAE      MPE     MAPE      MASE
## Training set 0.01700553 0.5204963 0.3658424 0.109258 4.049421 0.2273705
##                     ACF1
## Training set 0.003829795
forecast::accuracy(soy_fitR)
##                      ME      RMSE       MAE        MPE     MAPE      MASE
## Training set 0.03335124 0.5020207 0.4066544 0.03725125 3.276699 0.1832952
##                    ACF1
## Training set 0.00698056
AIC(soy_fit)
## [1] 554.0485
BIC(soy_fit)
## [1] 589.0484
AIC(soy_fitR)
## [1] 94.91211
BIC(soy_fitR)
## [1] 111.9385
#GARCH Model

actual_values_soy <- head(actual_values_soy, length(forecast_mean_soy))

mae <- mean(abs(forecast_mean_soy - actual_values_soy))
mse <- mean((forecast_mean_soy - actual_values_soy)^2)
rmse <- sqrt(mse)

# Print the results
cat(paste("MAE: ", mae, "\n"))
## MAE:  0.0434758867098116
cat(paste("MSE: ", mse, "\n"))
## MSE:  0.00269589445359164
cat(paste("RMSE: ", rmse, "\n"))
## RMSE:  0.0519220035591043
#GARCH Model 2

forecast_mean_soy2 <- as.numeric(soy_forecast_garch2@forecast$seriesFor)
actual_values_soy2 <- as.numeric(window(soy_ts, start = c(1993, 1)))
actual_values_soy2 <- head(actual_values_soy2, length(forecast_mean_soy2))

soy_mae <- mean(abs(forecast_mean_soy2 - actual_values_soy2))
soy_mse <- mean((forecast_mean_soy2 - actual_values_soy2)^2)
soy_rmse <- sqrt(soy_mse)
# Print the results
cat(paste("MAE: ", soy_mae, "\n"))
## MAE:  8.9266536433183
cat(paste("MSE: ", soy_mse, "\n"))
## MSE:  79.9110106038158
cat(paste("RMSE: ", soy_rmse, "\n"))
## RMSE:  8.93929586733853

21 Exploring Soybean Oil

# Looking at the price of Soybean Oil over the last 30 years
ggplot(soybeanoil, aes(date, price)) +
  geom_line() +
  labs(title = "Revenue by Day")

#Summary
summary(soybeanoil)
##       date                price           month          
##  Min.   :1993-01-04   Min.   :0.1441   Length:7428       
##  1st Qu.:2000-05-11   1st Qu.:0.2359   Class :character  
##  Median :2007-10-02   Median :0.2877   Mode  :character  
##  Mean   :2007-10-04   Mean   :0.3235                     
##  3rd Qu.:2015-02-26   3rd Qu.:0.3757                     
##  Max.   :2022-06-30   Max.   :0.8820
## All time high price in June of 2022
  
#Soybeanoil monthly average

soyo_m <- soybeanoil %>% group_by(month) %>% mutate(mon_avg = mean(price))%>%
select(month, mon_avg)

#Drop Duplicate rows Soybean Oil

soyo_m <- soyo_m[!duplicated(soyo_m),]

#Creating Time series data Soybean oil

soyo_ts <- ts(soyo_m$mon_avg,start=c(1993),frequency=12)

#Plotting Soybean Oil

chartSeries(soyo_ts)

## Price hikes in 2008, 2012 and 2020's

#Looking at trends
autoplot(decompose((soyo_ts)),main="") 

#Looking for daily or weekly trends Soybean oil
soyo_r <- soybeanoil %>%
filter(date >= as.Date('2022-04-01') & date <= as.Date('2023-06-01'))

ggplot(soyo_r, aes(date, price)) +
  geom_line() +
  labs(title = "Soybean Oil Prices Daily") + xlab("time") + ylab("prices")

# No price trends by day seen

22 Soybean oil ARIMA models

##Stationary Test
adf.test(soyo_ts, alternative = "stationary")
## 
##  Augmented Dickey-Fuller Test
## 
## data:  soyo_ts
## Dickey-Fuller = -1.9416, Lag order = 7, p-value = 0.6014
## alternative hypothesis: stationary
## After first-order differencing
adf.test(diff(soyo_ts), alternative ="stationary")
## Warning in adf.test(diff(soyo_ts), alternative = "stationary"): p-value smaller
## than printed p-value
## 
##  Augmented Dickey-Fuller Test
## 
## data:  diff(soyo_ts)
## Dickey-Fuller = -7.4399, Lag order = 7, p-value = 0.01
## alternative hypothesis: stationary
#This decreases our p-value and creates significance
#Data is now stationary making the value of (d)=1

#Custom Arima Model

#Correlation Plot and Tuning selection
#ACF (q)
acf(diff(soyo_ts),main='')

#q=4

#Looking at impact of price prior periods have on consecutive time periods
#PACF (p)
pacf(diff(soyo_ts),main='')

#p=3. There are 3 partial auto correlation values

# ARIMA Custom(best fit)

soyo_fit<- Arima(soyo_ts, order=c(3,1,4))
soyo_fit
## Series: soyo_ts 
## ARIMA(3,1,4) 
## 
## Coefficients:
##          ar1      ar2      ar3      ma1      ma2     ma3     ma4
##       0.8973  -0.1191  -0.4720  -0.4854  -0.0275  0.3970  0.3082
## s.e.  0.1993   0.2881   0.1848   0.1921   0.2209  0.1205  0.0671
## 
## sigma^2 = 0.0003321:  log likelihood = 915.98
## AIC=-1815.96   AICc=-1815.54   BIC=-1785.03
#Check residuals

checkresiduals(soyo_fit)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(3,1,4)
## Q* = 20.652, df = 17, p-value = 0.2423
## 
## Model df: 7.   Total lags used: 24
#Auto-fit Arima

auto_soyo<- auto.arima(soyo_ts)
auto_soyo
## Series: soyo_ts 
## ARIMA(1,1,0) 
## 
## Coefficients:
##          ar1
##       0.4295
## s.e.  0.0484
## 
## sigma^2 = 0.0003439:  log likelihood = 907.13
## AIC=-1810.27   AICc=-1810.23   BIC=-1802.53
##Forecast Plots

##Forecast Custom(better model)

autoplot(forecast::forecast(soyo_fit, h=12, level=c(80,95)))

##Forecast Auto (overfit)

autoplot(forecast::forecast(auto_soyo, h=12, level=c(80,95)))

#ARIMA using more recent data

soyo_r2 <- soybeanoil %>%
filter(date >= as.Date('2016-01-01') & date <= as.Date('2023-01-31'))

soyo_r2 <- soyo_r2 %>% group_by(month) %>% mutate(mon_avg = mean(price))%>%
select(month, mon_avg)

#Drop Duplicate rows

soyo_r2 <- soyo_r2[!duplicated(soyo_r2),]

soyo_tsr <- ts(soyo_r2$mon_avg,start=c(2016),frequency=12)

##Stationary Test
adf.test(soyo_tsr, alternative = "stationary")
## 
##  Augmented Dickey-Fuller Test
## 
## data:  soyo_tsr
## Dickey-Fuller = -0.68864, Lag order = 4, p-value = 0.9674
## alternative hypothesis: stationary
## After first-order differencing
adf.test(diff(soyo_tsr), alternative ="stationary")
## Warning in adf.test(diff(soyo_tsr), alternative = "stationary"): p-value smaller
## than printed p-value
## 
##  Augmented Dickey-Fuller Test
## 
## data:  diff(soyo_tsr)
## Dickey-Fuller = -4.5721, Lag order = 4, p-value = 0.01
## alternative hypothesis: stationary
#This decreases our p-value and creates significance
#d=1

#Custom Arima Model

#Correlation Plot and Tuning selection
#ACF (q)
acf(diff(soyo_tsr),main='')

#q=1

#Looking at impact of price prior periods have on consecutive time periods
#PACF (p)
pacf(diff(soyo_tsr),main='')

#p=1. There are 2 partial auto correlation values

# ARIMA Custom(better model)

soyo_fitR<- Arima(soyo_tsr, order=c(1,1,1))
soyo_fitR
## Series: soyo_tsr 
## ARIMA(1,1,1) 
## 
## Coefficients:
##          ar1      ma1
##       0.5389  -0.0714
## s.e.  0.1668   0.1882
## 
## sigma^2 = 0.0005035:  log likelihood = 183.98
## AIC=-361.97   AICc=-361.64   BIC=-354.93
#Check residuals

checkresiduals(soyo_fitR)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(1,1,1)
## Q* = 12.019, df = 14, p-value = 0.6048
## 
## Model df: 2.   Total lags used: 16
##Forecast Custom

autoplot(forecast::forecast(soyo_fitR, h=12, level=c(80,95)))

#Printing Predictions

soyo_predictions <- forecast::forecast(soyo_fitR,h=12)

print(soyo_predictions$mean)
##            Jan       Feb       Mar       Apr       May       Jun       Jul
## 2022                                                             0.7382803
## 2023 0.7128862 0.7125922 0.7124337 0.7123484 0.7123024 0.7122776          
##            Aug       Sep       Oct       Nov       Dec
## 2022 0.7262769 0.7198084 0.7163225 0.7144440 0.7134317
## 2023
print(soyo_predictions$mean)
##            Jan       Feb       Mar       Apr       May       Jun       Jul
## 2022                                                             0.7382803
## 2023 0.7128862 0.7125922 0.7124337 0.7123484 0.7123024 0.7122776          
##            Aug       Sep       Oct       Nov       Dec
## 2022 0.7262769 0.7198084 0.7163225 0.7144440 0.7134317
## 2023

23 GARCH Soybean Oil Modeling

# Model Creation
garch_model <- ugarchspec(variance.model = list(model = "sGARCH", 
                                                garchOrder = c(1,1)), 
                                                mean.model = list(armaOrder =
                                                c(0,0), include.mean = TRUE), distribution.model = "std")
# Fitting Model to Data

soyo_garch <- ugarchfit(spec = garch_model, data = soyo_ts)

print(soyo_garch)
## 
## *---------------------------------*
## *          GARCH Model Fit        *
## *---------------------------------*
## 
## Conditional Variance Dynamics    
## -----------------------------------
## GARCH Model  : sGARCH(1,1)
## Mean Model   : ARFIMA(0,0,0)
## Distribution : std 
## 
## Optimal Parameters
## ------------------------------------
##         Estimate  Std. Error  t value Pr(>|t|)
## mu     2.654e-01    0.005163  51.4028 0.000000
## omega  1.320e-04    0.000090   1.4727 0.140832
## alpha1 9.990e-01    0.361641   2.7624 0.005738
## beta1  0.000e+00    0.354191   0.0000 1.000000
## shape  1.000e+02   57.346231   1.7438 0.081195
## 
## Robust Standard Errors:
##         Estimate  Std. Error  t value Pr(>|t|)
## mu     2.654e-01    0.033620  7.89424  0.00000
## omega  1.320e-04    0.000465  0.28508  0.77558
## alpha1 9.990e-01    2.128517  0.46934  0.63883
## beta1  0.000e+00    2.119646  0.00000  1.00000
## shape  1.000e+02    2.772805 36.06456  0.00000
## 
## LogLikelihood : 470.9922 
## 
## Information Criteria
## ------------------------------------
##                     
## Akaike       -2.6327
## Bayes        -2.5781
## Shibata      -2.6331
## Hannan-Quinn -2.6110
## 
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
##                         statistic p-value
## Lag[1]                      282.7       0
## Lag[2*(p+q)+(p+q)-1][2]     404.9       0
## Lag[4*(p+q)+(p+q)-1][5]     715.7       0
## d.o.f=0
## H0 : No serial correlation
## 
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
##                         statistic p-value
## Lag[1]                      3.581 0.05846
## Lag[2*(p+q)+(p+q)-1][5]    10.024 0.00908
## Lag[4*(p+q)+(p+q)-1][9]    14.070 0.00598
## d.o.f=2
## 
## Weighted ARCH LM Tests
## ------------------------------------
##             Statistic Shape Scale P-Value
## ARCH Lag[3]    0.4061 0.500 2.000  0.5240
## ARCH Lag[5]    1.4593 1.440 1.667  0.6031
## ARCH Lag[7]    4.0776 2.315 1.543  0.3355
## 
## Nyblom stability test
## ------------------------------------
## Joint Statistic:  80.7041
## Individual Statistics:              
## mu      7.8160
## omega   0.5892
## alpha1  0.4907
## beta1   0.1933
## shape  41.3785
## 
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic:          1.28 1.47 1.88
## Individual Statistic:     0.35 0.47 0.75
## 
## Sign Bias Test
## ------------------------------------
##                    t-value   prob sig
## Sign Bias           1.1676 0.2438    
## Negative Sign Bias  0.0256 0.9796    
## Positive Sign Bias  0.5212 0.6026    
## Joint Effect        2.5982 0.4578    
## 
## 
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
##   group statistic p-value(g-1)
## 1    20     505.2    4.510e-95
## 2    30     555.2    1.241e-98
## 3    40     537.3    7.159e-89
## 4    50     551.1    4.132e-86
## 
## 
## Elapsed time : 0.1418879
#plot(soyo_garch, which = 1)

coef(soyo_garch)
##           mu        omega       alpha1        beta1        shape 
## 2.654041e-01 1.324920e-04 9.989999e-01 1.589806e-08 9.999998e+01
# Forecasting

horizon <- 3

soyo_vol <-ts(soyo_garch@fit$sigma^2,start=c(1993),frequency=12)

soyo_forecast_garch <- ugarchforecast(soyo_garch, n.ahead = 12)

forecast_mean_soyo <- as.numeric(soyo_forecast_garch@forecast$seriesFor)
actual_values_soyo <- as.numeric(window(soyo_ts, start = c(1993, 1)))


#plot(soyo_forecast_garch, n.plot = horizon, n.col = 1, plot.type = "single", 
     #main = "GARCH Forecast for soyo Prices", ylab = "Price", xlab = "Time") #%>%
#lines(soyo_ts[(length(soyo_ts)-horizon+1):length(soyo_ts)], col = "blue")


#Based on signficance. Let's try auto_arimas

garch_model2 <- ugarchspec(variance.model = list(model = "sGARCH", 
                                                garchOrder = c(1,1)), 
                                                mean.model = list(armaOrder =
                                                c(1,0), include.mean = TRUE), distribution.model = "std")

soyo_garch2 <- ugarchfit(spec = garch_model2, data = soyo_ts)
## Warning in arima(data, order = c(modelinc[2], 0, modelinc[3]), include.mean =
## modelinc[1], : possible convergence problem: optim gave code = 1
## Warning in .sgarchfit(spec = spec, data = data, out.sample = out.sample, : 
## ugarchfit-->warning: solver failer to converge.
soyo_garch2
## 
## *---------------------------------*
## *          GARCH Model Fit        *
## *---------------------------------*
## 
## Conditional Variance Dynamics    
## -----------------------------------
## GARCH Model  : sGARCH(1,1)
## Mean Model   : ARFIMA(1,0,0)
## Distribution : std 
## 
## Convergence Problem:
## Solver Message:
#Time Series based on volatility or Variance based on a standard Garch [1,1] model

#soyo_vol2 <-ts(soyo_garch2@fit$sigma^2,start=c(1993),frequency=12)

#plot(soyo_vol2,xlab="",ylab="",main="soyo_Volatility (GARCH[1,1])")



#GARCH 3

names(soyo_garch2@model)
##  [1] "modelinc"   "modeldesc"  "modeldata"  "pars"       "start.pars"
##  [6] "fixed.pars" "maxOrder"   "pos.matrix" "fmodel"     "pidx"      
## [11] "n.start"
names(soyo_garch2@fit)
## [1] "convergence" "solver"
#Variance
soyo_garch_var <- soyo_garch2@fit$var
#Residuals
soyo_garch_res <- (soyo_garch2@fit$residuals)^2

#Plotting residuals and conditional variances
#plot(soyo_garch_res, type = "l")
#lines(soyo_garch_var, col = "green")

#soyo_forecast_garch2 <- ugarchforecast(soyo_garch2, n.ahead = 12)
#soyo_forecast_garch2

24 soyobean Oil Model Comparison

#ARIMA Models
forecast::accuracy(soyo_fit)
##                        ME      RMSE        MAE       MPE     MAPE      MASE
## Training set 0.0008097532 0.0180158 0.01283394 0.1543794 3.959575 0.1940801
##                      ACF1
## Training set -0.006675165
forecast::accuracy(auto_soyo)
##                        ME       RMSE        MAE       MPE     MAPE      MASE
## Training set 0.0008238039 0.01849189 0.01313962 0.1557511 4.030274 0.1987027
##                      ACF1
## Training set -0.008424143
forecast::accuracy(soyo_fitR)
##                       ME       RMSE        MAE       MPE     MAPE      MASE
## Training set 0.002605911 0.02200404 0.01554205 0.5402573 3.714631 0.1878383
##                     ACF1
## Training set -0.01925028
AIC(soyo_fit)
## [1] -1815.961
BIC(soyo_fit)
## [1] -1785.029
AIC(soyo_fitR)
## [1] -361.9659
BIC(soyo_fitR)
## [1] -354.9345
#GARCH Model

actual_values_soyo <- head(actual_values_soyo, length(forecast_mean_soyo))

mae <- mean(abs(forecast_mean_soyo - actual_values_soyo))
mse <- mean((forecast_mean_soyo - actual_values_soyo)^2)
rmse <- sqrt(mse)

# Print the results
cat(paste("MAE: ", mae, "\n"))
## MAE:  0.0388457606662608
cat(paste("MSE: ", mse, "\n"))
## MSE:  0.0017633098679844
cat(paste("RMSE: ", rmse, "\n"))
## RMSE:  0.041991783338939

25 Exploring Corn

# Looking at the price of Corn over the last 30 years
ggplot(corn, aes(date, price)) +
  geom_line() +
  labs(title = "Revenue by Day")

#Summary Corn
summary(corn)
##       date                price          month          
##  Min.   :1993-01-04   Min.   :1.748   Length:7605       
##  1st Qu.:2000-07-18   1st Qu.:2.365   Class :character  
##  Median :2008-02-05   Median :3.506   Mode  :character  
##  Mean   :2008-02-02   Mean   :3.696                     
##  3rd Qu.:2015-08-20   3rd Qu.:4.210                     
##  Max.   :2023-02-17   Max.   :8.312
## Corn price reached an all time high in February of 2023
  
#Corn monthly average

corn_m <- corn %>% group_by(month) %>% mutate(mon_avg = mean(price))%>%
select(month, mon_avg)

#Drop Duplicate rows Corn

corn_m <- corn_m[!duplicated(corn_m),]

#Creating Time series data Corn

corn_ts <- ts(corn_m$mon_avg,start=c(1993),frequency=12)

#Plotting Corn

chartSeries(corn_ts)

## Price hikes in 1995, 2008, 2010's and 2020's

#Looking at trends

autoplot(decompose((corn_ts)),main="") 

#Looking for daily or weekly trends Corn
corn_r <- corn %>%
filter(date >= as.Date('2022-11-01') & date <= as.Date('2023-01-31'))

ggplot(corn_r, aes(date, price)) +
  geom_line() +
  labs(title = "Corn Prices Daily") + xlab("time") + ylab("prices")

## Price does not seem to fluctuate by day of the week

26 Corn ARIMA models

##Stationary Test
adf.test(corn_ts, alternative = "stationary")
## 
##  Augmented Dickey-Fuller Test
## 
## data:  corn_ts
## Dickey-Fuller = -2.6343, Lag order = 7, p-value = 0.3092
## alternative hypothesis: stationary
## After first-order differencing
adf.test(diff(corn_ts), alternative ="stationary")
## Warning in adf.test(diff(corn_ts), alternative = "stationary"): p-value smaller
## than printed p-value
## 
##  Augmented Dickey-Fuller Test
## 
## data:  diff(corn_ts)
## Dickey-Fuller = -8.179, Lag order = 7, p-value = 0.01
## alternative hypothesis: stationary
#This decreases our p-value and creates significance
#Data is now stationary making the value of (d)=1

#Custom Arima Model

#Correlation Plot and Tuning selection
#ACF (q)
acf(diff(corn_ts),main='')

#q=4

#Looking at impact of price prior periods have on consecutive time periods
#PACF (p)
pacf(diff(corn_ts),main='')

#p=4. There are 4 partial auto correlation values

# ARIMA Custom(best fit)

corn_fit<- Arima(corn_ts, order=c(4,1,4))
corn_fit
## Series: corn_ts 
## ARIMA(4,1,4) 
## 
## Coefficients:
##          ar1      ar2     ar3      ar4      ma1     ma2      ma3      ma4
##       1.1854  -1.0570  0.8632  -0.0586  -0.8941  0.8123  -0.5885  -0.2947
## s.e.  0.1538   0.1969  0.1838   0.1383   0.1474  0.1808   0.1789   0.1434
## 
## sigma^2 = 0.07194:  log likelihood = -34.06
## AIC=86.12   AICc=86.63   BIC=121.12
#Check residuals

checkresiduals(corn_fit)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(4,1,4)
## Q* = 33.005, df = 16, p-value = 0.007378
## 
## Model df: 8.   Total lags used: 24
#Auto-fit Arima

auto_corn<- auto.arima(corn_ts)
auto_corn
## Series: corn_ts 
## ARIMA(0,1,1)(0,0,1)[12] 
## 
## Coefficients:
##          ma1     sma1
##       0.3229  -0.1350
## s.e.  0.0504   0.0571
## 
## sigma^2 = 0.07494:  log likelihood = -43.71
## AIC=93.41   AICc=93.48   BIC=105.08
checkresiduals(auto_corn)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,1,1)(0,0,1)[12]
## Q* = 37.689, df = 22, p-value = 0.01985
## 
## Model df: 2.   Total lags used: 24
##Forecast Plots

##Forecast Custom(better model)

autoplot(forecast::forecast(corn_fit, h=12, level=c(80,95)))

##Forecast Auto (overfit)

autoplot(forecast::forecast(auto_corn, h=12, level=c(80,95)))

#ARIMA using more recent data

corn_r2 <- corn %>%
filter(date >= as.Date('2016-01-01') & date <= as.Date('2023-01-31'))

corn_r2 <- corn_r2 %>% group_by(month) %>% mutate(mon_avg = mean(price))%>%
select(month, mon_avg)

#Drop Duplicate rows

corn_r2 <- corn_r2[!duplicated(corn_r2),]

corn_tsr <- ts(corn_r2$mon_avg,start=c(2016),frequency=12)

##Stationary Test
adf.test(corn_tsr, alternative = "stationary")
## 
##  Augmented Dickey-Fuller Test
## 
## data:  corn_tsr
## Dickey-Fuller = -2.2964, Lag order = 4, p-value = 0.4543
## alternative hypothesis: stationary
## After first-order differencing
adf.test(diff(corn_tsr), alternative ="stationary")
## Warning in adf.test(diff(corn_tsr), alternative = "stationary"): p-value smaller
## than printed p-value
## 
##  Augmented Dickey-Fuller Test
## 
## data:  diff(corn_tsr)
## Dickey-Fuller = -4.7015, Lag order = 4, p-value = 0.01
## alternative hypothesis: stationary
#This decreases our p-value and creates significance
#d=1

#Custom Arima Model

#Correlation Plot and Tuning selection
#ACF (q)
acf(diff(corn_tsr),main='')

#q=1

#Looking at impact of price prior periods have on consecutive time periods
#PACF (p)
pacf(diff(corn_tsr),main='')

#p=3. There are 3 partial auto correlation values

# ARIMA Custom(overfit)

corn_fitR<- Arima(corn_tsr, order=c(3,1,1))
corn_fitR
## Series: corn_tsr 
## ARIMA(3,1,1) 
## 
## Coefficients:
##          ar1      ar2     ar3     ma1
##       0.3584  -0.2581  0.0789  0.1171
## s.e.  0.5370   0.2613  0.1792  0.5311
## 
## sigma^2 = 0.07726:  log likelihood = -9.75
## AIC=29.5   AICc=30.27   BIC=41.66
#Check residuals

checkresiduals(corn_fitR)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(3,1,1)
## Q* = 16.143, df = 13, p-value = 0.2415
## 
## Model df: 4.   Total lags used: 17
#Auto-fit Arima(overfit)

auto_cornR<- auto.arima(corn_tsr)
auto_cornR
## Series: corn_tsr 
## ARIMA(0,1,1) 
## 
## Coefficients:
##          ma1
##       0.4938
## s.e.  0.0957
## 
## sigma^2 = 0.07547:  log likelihood = -10.3
## AIC=24.59   AICc=24.74   BIC=29.46
checkresiduals(auto_cornR)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,1,1)
## Q* = 16.545, df = 16, p-value = 0.4156
## 
## Model df: 1.   Total lags used: 17
##Forecast Plot

autoplot(forecast::forecast(auto_cornR, h=12, level=c(80,95)))

##Forecast Custom

autoplot(forecast::forecast(corn_fitR, h=12, level=c(80,95)))

#Printing Predictions

corn_predictions <- forecast::forecast(corn_fitR,h=12)

print(corn_predictions$mean)
##           Jan      Feb      Mar      Apr      May      Jun      Jul      Aug
## 2023          6.833895 6.815075 6.791769 6.798035 6.804812 6.803784 6.802160
## 2024 6.802662                                                               
##           Sep      Oct      Nov      Dec
## 2023 6.802379 6.802795 6.802760 6.802657
## 2024
corn_predictions <- forecast::forecast(auto_cornR,h=12)

print(corn_predictions$mean)
##           Jan      Feb      Mar      Apr      May      Jun      Jul      Aug
## 2023          6.837366 6.837366 6.837366 6.837366 6.837366 6.837366 6.837366
## 2024 6.837366                                                               
##           Sep      Oct      Nov      Dec
## 2023 6.837366 6.837366 6.837366 6.837366
## 2024

27 GARCH Corn Modeling

# Model Creation
garch_model <- ugarchspec(variance.model = list(model = "sGARCH", 
                                                garchOrder = c(1,1)), 
                                                mean.model = list(armaOrder =
                                                c(0,0), include.mean = TRUE), distribution.model = "std")
# Fitting Model to Data

corn_garch <- ugarchfit(spec = garch_model, data = corn_ts)

print(corn_garch)
## 
## *---------------------------------*
## *          GARCH Model Fit        *
## *---------------------------------*
## 
## Conditional Variance Dynamics    
## -----------------------------------
## GARCH Model  : sGARCH(1,1)
## Mean Model   : ARFIMA(0,0,0)
## Distribution : std 
## 
## Optimal Parameters
## ------------------------------------
##         Estimate  Std. Error  t value Pr(>|t|)
## mu      3.629372    0.029175 124.3990  0.00000
## omega   0.032509    0.005407   6.0122  0.00000
## alpha1  0.998999    0.047015  21.2486  0.00000
## beta1   0.000000    0.080373   0.0000  1.00000
## shape  99.992685   68.196361   1.4662  0.14258
## 
## Robust Standard Errors:
##         Estimate  Std. Error  t value Pr(>|t|)
## mu      3.629372    0.055652  65.2160 0.000000
## omega   0.032509    0.009629   3.3762 0.000735
## alpha1  0.998999    0.119584   8.3540 0.000000
## beta1   0.000000    0.118365   0.0000 1.000000
## shape  99.992685   15.275796   6.5458 0.000000
## 
## LogLikelihood : -456.3449 
## 
## Information Criteria
## ------------------------------------
##                    
## Akaike       2.5489
## Bayes        2.6026
## Shibata      2.5485
## Hannan-Quinn 2.5702
## 
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
##                         statistic p-value
## Lag[1]                      261.4       0
## Lag[2*(p+q)+(p+q)-1][2]     357.4       0
## Lag[4*(p+q)+(p+q)-1][5]     580.3       0
## d.o.f=0
## H0 : No serial correlation
## 
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
##                         statistic   p-value
## Lag[1]                      13.09 0.0002968
## Lag[2*(p+q)+(p+q)-1][5]     14.17 0.0007417
## Lag[4*(p+q)+(p+q)-1][9]     16.46 0.0015718
## d.o.f=2
## 
## Weighted ARCH LM Tests
## ------------------------------------
##             Statistic Shape Scale P-Value
## ARCH Lag[3]    0.8891 0.500 2.000  0.3457
## ARCH Lag[5]    1.3093 1.440 1.667  0.6439
## ARCH Lag[7]    2.7544 2.315 1.543  0.5611
## 
## Nyblom stability test
## ------------------------------------
## Joint Statistic:  47.7516
## Individual Statistics:              
## mu      3.7398
## omega   0.2365
## alpha1  0.5115
## beta1   1.1196
## shape  29.9616
## 
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic:          1.28 1.47 1.88
## Individual Statistic:     0.35 0.47 0.75
## 
## Sign Bias Test
## ------------------------------------
##                    t-value    prob sig
## Sign Bias            2.489 0.01328  **
## Negative Sign Bias   1.138 0.25570    
## Positive Sign Bias   1.172 0.24180    
## Joint Effect         7.104 0.06865   *
## 
## 
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
##   group statistic p-value(g-1)
## 1    20     573.6   1.881e-109
## 2    30     631.5   1.871e-114
## 3    40     668.1   1.605e-115
## 4    50     707.3   1.679e-117
## 
## 
## Elapsed time : 0.1471329
#plot(corn_garch, which = 1)

coef(corn_garch)
##           mu        omega       alpha1        beta1        shape 
## 3.629372e+00 3.250919e-02 9.989994e-01 2.010076e-08 9.999268e+01
# Forecasting

horizon <- 3

corn_forecast_garch <- ugarchforecast(corn_garch, n.ahead = horizon)

corn_forecast_mean <- as.numeric(corn_forecast_garch@forecast$seriesFor)
corn_actual_values <- as.numeric(window(corn_ts, start = c(1993, 1)))


#plot(corn_forecast_garch, n.plot = horizon, n.col = 1, plot.type = "single", 
    # main = "GARCH Forecast for corn Prices", ylab = "Price", xlab = "Time") #%>%
#lines(corn_ts[(length(corn_ts)-horizon+1):length(corn_ts)], col = "blue")



#Based on signficance. Let's try auto_arimas

garch_model <- ugarchspec(variance.model = list(model = "sGARCH", 
                                                garchOrder = c(1,1)), 
                                                mean.model = list(armaOrder =
                                                c(1,0), include.mean = TRUE), distribution.model = "std")

corn_garch2 <- ugarchfit(spec = garch_model, data = corn_ts)
corn_garch2
## 
## *---------------------------------*
## *          GARCH Model Fit        *
## *---------------------------------*
## 
## Conditional Variance Dynamics    
## -----------------------------------
## GARCH Model  : sGARCH(1,1)
## Mean Model   : ARFIMA(1,0,0)
## Distribution : std 
## 
## Optimal Parameters
## ------------------------------------
##         Estimate  Std. Error  t value Pr(>|t|)
## mu       2.18098    0.141929  15.3666 0.000000
## ar1      0.99876    0.007648 130.5945 0.000000
## omega    0.00450    0.003260   1.3802 0.167538
## alpha1   0.23114    0.108458   2.1311 0.033078
## beta1    0.76786    0.108470   7.0790 0.000000
## shape    3.16313    0.534512   5.9178 0.000000
## 
## Robust Standard Errors:
##         Estimate  Std. Error  t value Pr(>|t|)
## mu       2.18098    0.031853  68.4709 0.000000
## ar1      0.99876    0.008974 111.2930 0.000000
## omega    0.00450    0.004201   1.0712 0.284073
## alpha1   0.23114    0.130453   1.7718 0.076425
## beta1    0.76786    0.155690   4.9320 0.000001
## shape    3.16313    0.498626   6.3437 0.000000
## 
## LogLikelihood : 23.50113 
## 
## Information Criteria
## ------------------------------------
##                       
## Akaike       -0.096691
## Bayes        -0.032189
## Shibata      -0.097229
## Hannan-Quinn -0.071049
## 
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
##                         statistic   p-value
## Lag[1]                      19.93 8.051e-06
## Lag[2*(p+q)+(p+q)-1][2]     19.96 0.000e+00
## Lag[4*(p+q)+(p+q)-1][5]     20.53 1.362e-08
## d.o.f=1
## H0 : No serial correlation
## 
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
##                         statistic p-value
## Lag[1]                     0.1213  0.7276
## Lag[2*(p+q)+(p+q)-1][5]    0.4571  0.9641
## Lag[4*(p+q)+(p+q)-1][9]    1.6224  0.9450
## d.o.f=2
## 
## Weighted ARCH LM Tests
## ------------------------------------
##             Statistic Shape Scale P-Value
## ARCH Lag[3]   0.01666 0.500 2.000  0.8973
## ARCH Lag[5]   0.73569 1.440 1.667  0.8126
## ARCH Lag[7]   1.40213 2.315 1.543  0.8408
## 
## Nyblom stability test
## ------------------------------------
## Joint Statistic:  0.9697
## Individual Statistics:              
## mu     0.22975
## ar1    0.07387
## omega  0.26481
## alpha1 0.13692
## beta1  0.21882
## shape  0.18026
## 
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic:          1.49 1.68 2.12
## Individual Statistic:     0.35 0.47 0.75
## 
## Sign Bias Test
## ------------------------------------
##                    t-value   prob sig
## Sign Bias           0.3529 0.7244    
## Negative Sign Bias  1.2854 0.1995    
## Positive Sign Bias  0.1674 0.8672    
## Joint Effect        1.8435 0.6055    
## 
## 
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
##   group statistic p-value(g-1)
## 1    20     28.61      0.07241
## 2    30     37.12      0.14328
## 3    40     47.28      0.17033
## 4    50     58.72      0.16115
## 
## 
## Elapsed time : 0.1233358
#Time Series based on volatility or Variance based on a standard Garch [1,1] model

corn_vol <-ts(corn_garch2@fit$sigma^2,start=c(1993),frequency=12)

#plot(corn_vol,xlab="",ylab="",main="corn_Volatility (GARCH[1,1])")

#Exponential GARCH
Egarch_model <- ugarchspec(variance.model = list(model = "eGARCH", 
                                                garchOrder = c(1,1)), 
                                                mean.model = list(armaOrder =
                                                c(1,0), include.mean = TRUE), distribution.model = "std")

corn_egarch2 <- ugarchfit(spec = garch_model, data = corn_ts)
corn_egarch2
## 
## *---------------------------------*
## *          GARCH Model Fit        *
## *---------------------------------*
## 
## Conditional Variance Dynamics    
## -----------------------------------
## GARCH Model  : sGARCH(1,1)
## Mean Model   : ARFIMA(1,0,0)
## Distribution : std 
## 
## Optimal Parameters
## ------------------------------------
##         Estimate  Std. Error  t value Pr(>|t|)
## mu       2.18098    0.141929  15.3666 0.000000
## ar1      0.99876    0.007648 130.5945 0.000000
## omega    0.00450    0.003260   1.3802 0.167538
## alpha1   0.23114    0.108458   2.1311 0.033078
## beta1    0.76786    0.108470   7.0790 0.000000
## shape    3.16313    0.534512   5.9178 0.000000
## 
## Robust Standard Errors:
##         Estimate  Std. Error  t value Pr(>|t|)
## mu       2.18098    0.031853  68.4709 0.000000
## ar1      0.99876    0.008974 111.2930 0.000000
## omega    0.00450    0.004201   1.0712 0.284073
## alpha1   0.23114    0.130453   1.7718 0.076425
## beta1    0.76786    0.155690   4.9320 0.000001
## shape    3.16313    0.498626   6.3437 0.000000
## 
## LogLikelihood : 23.50113 
## 
## Information Criteria
## ------------------------------------
##                       
## Akaike       -0.096691
## Bayes        -0.032189
## Shibata      -0.097229
## Hannan-Quinn -0.071049
## 
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
##                         statistic   p-value
## Lag[1]                      19.93 8.051e-06
## Lag[2*(p+q)+(p+q)-1][2]     19.96 0.000e+00
## Lag[4*(p+q)+(p+q)-1][5]     20.53 1.362e-08
## d.o.f=1
## H0 : No serial correlation
## 
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
##                         statistic p-value
## Lag[1]                     0.1213  0.7276
## Lag[2*(p+q)+(p+q)-1][5]    0.4571  0.9641
## Lag[4*(p+q)+(p+q)-1][9]    1.6224  0.9450
## d.o.f=2
## 
## Weighted ARCH LM Tests
## ------------------------------------
##             Statistic Shape Scale P-Value
## ARCH Lag[3]   0.01666 0.500 2.000  0.8973
## ARCH Lag[5]   0.73569 1.440 1.667  0.8126
## ARCH Lag[7]   1.40213 2.315 1.543  0.8408
## 
## Nyblom stability test
## ------------------------------------
## Joint Statistic:  0.9697
## Individual Statistics:              
## mu     0.22975
## ar1    0.07387
## omega  0.26481
## alpha1 0.13692
## beta1  0.21882
## shape  0.18026
## 
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic:          1.49 1.68 2.12
## Individual Statistic:     0.35 0.47 0.75
## 
## Sign Bias Test
## ------------------------------------
##                    t-value   prob sig
## Sign Bias           0.3529 0.7244    
## Negative Sign Bias  1.2854 0.1995    
## Positive Sign Bias  0.1674 0.8672    
## Joint Effect        1.8435 0.6055    
## 
## 
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
##   group statistic p-value(g-1)
## 1    20     28.61      0.07241
## 2    30     37.12      0.14328
## 3    40     47.28      0.17033
## 4    50     58.72      0.16115
## 
## 
## Elapsed time : 0.1288261
coef(corn_egarch2)
##          mu         ar1       omega      alpha1       beta1       shape 
## 2.180975972 0.998758590 0.004499884 0.231137911 0.767862088 3.163133182
#Time Series based on volatility or Variance based on a standard Garch [1,1] model

ecorn_vol <-ts(corn_egarch2@fit$sigma^2,start=c(1993),frequency=12)

#plot(ecorn_vol,xlab="",ylab="",main="corn_Volatility (eGARCH[1,1])")

cor(corn_vol, ecorn_vol)
## [1] 1
#ts.plot(corn_vol,ecorn_vol,col=c("green","red"),xlab="")
#legend("topright",legend=c("Standard","Exponential"),col=c("green","red"),lty=c(1,1))
#No difference shown

#GARCH 3

names(corn_garch2@model)
##  [1] "modelinc"   "modeldesc"  "modeldata"  "pars"       "start.pars"
##  [6] "fixed.pars" "maxOrder"   "pos.matrix" "fmodel"     "pidx"      
## [11] "n.start"
names(corn_garch2@fit)
##  [1] "hessian"         "cvar"            "var"             "sigma"          
##  [5] "condH"           "z"               "LLH"             "log.likelihoods"
##  [9] "residuals"       "coef"            "robust.cvar"     "A"              
## [13] "B"               "scores"          "se.coef"         "tval"           
## [17] "matcoef"         "robust.se.coef"  "robust.tval"     "robust.matcoef" 
## [21] "fitted.values"   "convergence"     "kappa"           "persistence"    
## [25] "timer"           "ipars"           "solver"
#Variance
corn_garch_var <- corn_garch2@fit$var
#Residuals
corn_garch_res <- (corn_garch2@fit$residuals)^2

#Plotting residuals and conditional variances
plot(corn_garch_res, type = "l")
lines(corn_garch_var, col = "green")

corn_forecast_garch2 <- ugarchforecast(corn_garch2, n.ahead = 12)
corn_forecast_garch2
## 
## *------------------------------------*
## *       GARCH Model Forecast         *
## *------------------------------------*
## Model: sGARCH
## Horizon: 12
## Roll Steps: 0
## Out of Sample: 0
## 
## 0-roll forecast [T0=Feb 2023]:
##      Series  Sigma
## T+1   6.774 0.3252
## T+2   6.768 0.3319
## T+3   6.763 0.3385
## T+4   6.757 0.3449
## T+5   6.751 0.3512
## T+6   6.745 0.3573
## T+7   6.740 0.3634
## T+8   6.734 0.3694
## T+9   6.728 0.3752
## T+10  6.723 0.3810
## T+11  6.717 0.3867
## T+12  6.712 0.3923
ug_corn <- corn_forecast_garch2@forecast$sigmaFor
plot(ug_corn, type = "l")

corn_var_t <- c(tail(corn_garch_var,20),rep(NA,10))  # gets the last 20 observations
corn_res_t <- c(tail(corn_garch_res,20),rep(NA,10))  # gets the last 20 observations
corn_f <- c(rep(NA,20),(ug_corn)^2)

plot(corn_res_t, type = "l") #Residuals
lines(corn_f, col = "orange") # Predictions 
lines(corn_var_t, col = "green") #Conditional Forecast
legend("topright", 
       legend = c("Residuals", "Predictions", "Conditional Forecast"), 
       col = c("black", "orange", "green"), 
       lty = c(1, 1, 1), 
       cex = 0.8)

#Plot Predictions
#plot(corn_forecast_garch2, main = "Forecasted corn Prices (GARCH(1,1))")
#legend("bottomright", legend = c("Mean", "Lower 95% CI", "Upper 95% CI"), col = c("black", "blue", "red"), lty = 1)

corn_mean_forecast <- as.numeric(corn_forecast_garch2@forecast$seriesFor)

28 Corn Model Comparison

#ARIMA Models
forecast::accuracy(corn_fit)
##                      ME      RMSE       MAE       MPE    MAPE      MASE
## Training set 0.01712893 0.2648581 0.1742406 0.2067825 4.61023 0.2193029
##                      ACF1
## Training set -0.002718111
forecast::accuracy(auto_corn)
##                      ME      RMSE       MAE       MPE    MAPE      MASE
## Training set 0.01097705 0.2726103 0.1791116 0.1083757 4.71269 0.2254337
##                     ACF1
## Training set 0.007004455
forecast::accuracy(corn_fitR)
##                      ME     RMSE       MAE       MPE    MAPE      MASE
## Training set 0.02756006 0.269654 0.1843457 0.4523231 3.98982 0.2603172
##                      ACF1
## Training set -0.002711066
forecast::accuracy(auto_cornR)
##                      ME      RMSE       MAE       MPE     MAPE      MASE
## Training set 0.02538322 0.2714618 0.1854949 0.4239981 4.019807 0.2619401
##                     ACF1
## Training set -0.02101573
AIC(corn_fit)
## [1] 86.11677
BIC(corn_fit)
## [1] 121.1167
AIC(corn_fitR)
## [1] 29.50332
BIC(corn_fitR)
## [1] 41.6574
#GARCH Model

corn_actual_values <- head(corn_actual_values, length(corn_forecast_mean))

mae <- mean(abs(corn_mean_forecast - corn_actual_values))
mse <- mean((corn_mean_forecast - corn_actual_values)^2)
rmse <- sqrt(mse)

# Print the results
cat(paste("MAE: ", mae, "\n"))
## MAE:  4.5589735838937
cat(paste("MSE: ", mse, "\n"))
## MSE:  20.7869575052084
cat(paste("RMSE: ", rmse, "\n"))
## RMSE:  4.55927159809639
forecast_mean_corn2 <- as.numeric(corn_forecast_garch2@forecast$seriesFor)
actual_values_corn2 <- as.numeric(window(corn_ts, start = c(1993, 1)))
actual_values_corn2 <- head(actual_values_corn2, length(forecast_mean_corn2))

corn_mae <- mean(abs(forecast_mean_corn2 - actual_values_corn2))
corn_mse <- mean((forecast_mean_corn2 - actual_values_corn2)^2)
corn_rmse <- sqrt(corn_mse)
# Print the results
cat(paste("MAE: ", corn_mae, "\n"))
## MAE:  4.3509781922319
cat(paste("MSE: ", corn_mse, "\n"))
## MSE:  18.9941872437891
cat(paste("RMSE: ", corn_rmse, "\n"))
## RMSE:  4.35823212366999
corn_garch
## 
## *---------------------------------*
## *          GARCH Model Fit        *
## *---------------------------------*
## 
## Conditional Variance Dynamics    
## -----------------------------------
## GARCH Model  : sGARCH(1,1)
## Mean Model   : ARFIMA(0,0,0)
## Distribution : std 
## 
## Optimal Parameters
## ------------------------------------
##         Estimate  Std. Error  t value Pr(>|t|)
## mu      3.629372    0.029175 124.3990  0.00000
## omega   0.032509    0.005407   6.0122  0.00000
## alpha1  0.998999    0.047015  21.2486  0.00000
## beta1   0.000000    0.080373   0.0000  1.00000
## shape  99.992685   68.196361   1.4662  0.14258
## 
## Robust Standard Errors:
##         Estimate  Std. Error  t value Pr(>|t|)
## mu      3.629372    0.055652  65.2160 0.000000
## omega   0.032509    0.009629   3.3762 0.000735
## alpha1  0.998999    0.119584   8.3540 0.000000
## beta1   0.000000    0.118365   0.0000 1.000000
## shape  99.992685   15.275796   6.5458 0.000000
## 
## LogLikelihood : -456.3449 
## 
## Information Criteria
## ------------------------------------
##                    
## Akaike       2.5489
## Bayes        2.6026
## Shibata      2.5485
## Hannan-Quinn 2.5702
## 
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
##                         statistic p-value
## Lag[1]                      261.4       0
## Lag[2*(p+q)+(p+q)-1][2]     357.4       0
## Lag[4*(p+q)+(p+q)-1][5]     580.3       0
## d.o.f=0
## H0 : No serial correlation
## 
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
##                         statistic   p-value
## Lag[1]                      13.09 0.0002968
## Lag[2*(p+q)+(p+q)-1][5]     14.17 0.0007417
## Lag[4*(p+q)+(p+q)-1][9]     16.46 0.0015718
## d.o.f=2
## 
## Weighted ARCH LM Tests
## ------------------------------------
##             Statistic Shape Scale P-Value
## ARCH Lag[3]    0.8891 0.500 2.000  0.3457
## ARCH Lag[5]    1.3093 1.440 1.667  0.6439
## ARCH Lag[7]    2.7544 2.315 1.543  0.5611
## 
## Nyblom stability test
## ------------------------------------
## Joint Statistic:  47.7516
## Individual Statistics:              
## mu      3.7398
## omega   0.2365
## alpha1  0.5115
## beta1   1.1196
## shape  29.9616
## 
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic:          1.28 1.47 1.88
## Individual Statistic:     0.35 0.47 0.75
## 
## Sign Bias Test
## ------------------------------------
##                    t-value    prob sig
## Sign Bias            2.489 0.01328  **
## Negative Sign Bias   1.138 0.25570    
## Positive Sign Bias   1.172 0.24180    
## Joint Effect         7.104 0.06865   *
## 
## 
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
##   group statistic p-value(g-1)
## 1    20     573.6   1.881e-109
## 2    30     631.5   1.871e-114
## 3    40     668.1   1.605e-115
## 4    50     707.3   1.679e-117
## 
## 
## Elapsed time : 0.1471329
corn_garch2
## 
## *---------------------------------*
## *          GARCH Model Fit        *
## *---------------------------------*
## 
## Conditional Variance Dynamics    
## -----------------------------------
## GARCH Model  : sGARCH(1,1)
## Mean Model   : ARFIMA(1,0,0)
## Distribution : std 
## 
## Optimal Parameters
## ------------------------------------
##         Estimate  Std. Error  t value Pr(>|t|)
## mu       2.18098    0.141929  15.3666 0.000000
## ar1      0.99876    0.007648 130.5945 0.000000
## omega    0.00450    0.003260   1.3802 0.167538
## alpha1   0.23114    0.108458   2.1311 0.033078
## beta1    0.76786    0.108470   7.0790 0.000000
## shape    3.16313    0.534512   5.9178 0.000000
## 
## Robust Standard Errors:
##         Estimate  Std. Error  t value Pr(>|t|)
## mu       2.18098    0.031853  68.4709 0.000000
## ar1      0.99876    0.008974 111.2930 0.000000
## omega    0.00450    0.004201   1.0712 0.284073
## alpha1   0.23114    0.130453   1.7718 0.076425
## beta1    0.76786    0.155690   4.9320 0.000001
## shape    3.16313    0.498626   6.3437 0.000000
## 
## LogLikelihood : 23.50113 
## 
## Information Criteria
## ------------------------------------
##                       
## Akaike       -0.096691
## Bayes        -0.032189
## Shibata      -0.097229
## Hannan-Quinn -0.071049
## 
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
##                         statistic   p-value
## Lag[1]                      19.93 8.051e-06
## Lag[2*(p+q)+(p+q)-1][2]     19.96 0.000e+00
## Lag[4*(p+q)+(p+q)-1][5]     20.53 1.362e-08
## d.o.f=1
## H0 : No serial correlation
## 
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
##                         statistic p-value
## Lag[1]                     0.1213  0.7276
## Lag[2*(p+q)+(p+q)-1][5]    0.4571  0.9641
## Lag[4*(p+q)+(p+q)-1][9]    1.6224  0.9450
## d.o.f=2
## 
## Weighted ARCH LM Tests
## ------------------------------------
##             Statistic Shape Scale P-Value
## ARCH Lag[3]   0.01666 0.500 2.000  0.8973
## ARCH Lag[5]   0.73569 1.440 1.667  0.8126
## ARCH Lag[7]   1.40213 2.315 1.543  0.8408
## 
## Nyblom stability test
## ------------------------------------
## Joint Statistic:  0.9697
## Individual Statistics:              
## mu     0.22975
## ar1    0.07387
## omega  0.26481
## alpha1 0.13692
## beta1  0.21882
## shape  0.18026
## 
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic:          1.49 1.68 2.12
## Individual Statistic:     0.35 0.47 0.75
## 
## Sign Bias Test
## ------------------------------------
##                    t-value   prob sig
## Sign Bias           0.3529 0.7244    
## Negative Sign Bias  1.2854 0.1995    
## Positive Sign Bias  0.1674 0.8672    
## Joint Effect        1.8435 0.6055    
## 
## 
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
##   group statistic p-value(g-1)
## 1    20     28.61      0.07241
## 2    30     37.12      0.14328
## 3    40     47.28      0.17033
## 4    50     58.72      0.16115
## 
## 
## Elapsed time : 0.1233358

29 Exploring Cotton

# Looking at the price of Cotton over the last 30 years
ggplot(cotton, aes(date, price)) +
  geom_line() +
  labs(title = "Revenue by Day")

#Summary of Cotton
summary(cotton)
##       date                price           month          
##  Min.   :1993-01-04   Min.   :0.2852   Length:7586       
##  1st Qu.:2000-07-27   1st Qu.:0.5757   Class :character  
##  Median :2008-03-10   Median :0.6839   Mode  :character  
##  Mean   :2008-02-21   Mean   :0.7143                     
##  3rd Qu.:2015-09-16   3rd Qu.:0.8070                     
##  Max.   :2023-02-17   Max.   :2.1414
#Cotton monthly average

cotton_m <- cotton %>% group_by(month) %>% mutate(mon_avg = mean(price))%>%
select(month, mon_avg)

#Drop Duplicate rows Cotton

cotton_m <- cotton_m[!duplicated(cotton_m),]

#Creating Time series data Cotton

cotton_ts <- ts(cotton_m$mon_avg,start=c(1993),frequency=12)

#Plotting Cotton

chartSeries(cotton_ts)

## Huge price hike in 2012

#Looking at trends
autoplot(decompose((cotton_ts)),main="") 

#Looking for daily or weekly trends Cotton
cotton_r <- cotton %>%
filter(date >= as.Date('2022-1-01') & date <= as.Date('2022-12-31'))

ggplot(cotton_r, aes(date, price)) +
  geom_line() +
  labs(title = "Cotton Prices Daily") + xlab("time") + ylab("prices")

## Price does not seem to fluctuate by day of the week

30 Cotton ARIMA models

##Stationary Test
adf.test(cotton_ts, alternative = "stationary")
## 
##  Augmented Dickey-Fuller Test
## 
## data:  cotton_ts
## Dickey-Fuller = -3.6639, Lag order = 7, p-value = 0.02708
## alternative hypothesis: stationary
## After first-order differencing
adf.test(diff(cotton_ts), alternative ="stationary")
## Warning in adf.test(diff(cotton_ts), alternative = "stationary"): p-value
## smaller than printed p-value
## 
##  Augmented Dickey-Fuller Test
## 
## data:  diff(cotton_ts)
## Dickey-Fuller = -8.0778, Lag order = 7, p-value = 0.01
## alternative hypothesis: stationary
#This decreases our p-value and creates significance
#Already a low p value d = 0

#Custom Arima Model

#Correlation Plot and Tuning selection
#ACF (q)
acf(diff(cotton_ts),main='')

#q=6

#Looking at impact of price prior periods have on consecutive time periods
#PACF (p)
pacf(diff(cotton_ts),main='')

#p=4. There are 4 partial auto correlation values

# ARIMA Custom(best fit)

cotton_fit<- Arima(cotton_ts, order=c(4,0,6))
cotton_fit
## Series: cotton_ts 
## ARIMA(4,0,6) with non-zero mean 
## 
## Coefficients:
##           ar1     ar2     ar3     ar4     ma1     ma2     ma3     ma4     ma5
##       -0.1497  0.0744  0.6415  0.1818  1.3282  1.3555  0.7720  0.4236  0.1456
## s.e.   0.3350  0.0994  0.1921  0.2557  0.3312  0.3997  0.4156  0.2321  0.1993
##           ma6   mean
##       -0.0667  0.716
## s.e.   0.1089  0.058
## 
## sigma^2 = 0.003459:  log likelihood = 516.02
## AIC=-1008.04   AICc=-1007.15   BIC=-961.34
#Check residuals

checkresiduals(cotton_fit)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(4,0,6) with non-zero mean
## Q* = 21.874, df = 14, p-value = 0.08124
## 
## Model df: 10.   Total lags used: 24
#Auto-fit Arima

auto_cotton<- auto.arima(cotton_ts)
auto_cotton
## Series: cotton_ts 
## ARIMA(1,1,1) 
## 
## Coefficients:
##          ar1      ma1
##       0.5245  -0.3203
## s.e.  0.1458   0.1594
## 
## sigma^2 = 0.003706:  log likelihood = 499.12
## AIC=-992.25   AICc=-992.18   BIC=-980.58
checkresiduals(auto_cotton)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(1,1,1)
## Q* = 43.336, df = 22, p-value = 0.00429
## 
## Model df: 2.   Total lags used: 24
##Forecast Plots

##Forecast Custom

autoplot(forecast::forecast(cotton_fit, h=12, level=c(80,95))) #(better model)

##Forecast Auto 

autoplot(forecast::forecast(auto_cotton, h=12, level=c(80,95)))#(overfit)

#ARIMA using more recent data

cotton_r2 <- cotton %>%
filter(date >= as.Date('2016-01-01') & date <= as.Date('2023-01-31'))

cotton_r2 <- cotton_r2 %>% group_by(month) %>% mutate(mon_avg = mean(price))%>%
select(month, mon_avg)

#Drop Duplicate rows

cotton_r2 <- cotton_r2[!duplicated(cotton_r2),]

cotton_tsr <- ts(cotton_r2$mon_avg,start=c(2016),frequency=12)

##Stationary Test
adf.test(cotton_tsr, alternative = "stationary")
## 
##  Augmented Dickey-Fuller Test
## 
## data:  cotton_tsr
## Dickey-Fuller = -2.5097, Lag order = 4, p-value = 0.3667
## alternative hypothesis: stationary
## After first-order differencing
adf.test(diff(cotton_tsr), alternative ="stationary")
## 
##  Augmented Dickey-Fuller Test
## 
## data:  diff(cotton_tsr)
## Dickey-Fuller = -3.7655, Lag order = 4, p-value = 0.02446
## alternative hypothesis: stationary
#This decreases our p-value and creates significance
#d=1

#Custom Arima Model

#Correlation Plot and Tuning selection
#ACF (q)
acf(diff(cotton_tsr),main='')

#q=1

#Looking at impact of price prior periods have on consecutive time periods
#PACF (p)
pacf(diff(cotton_tsr),main='')

#p=1. There are 1 partial auto correlation values

# ARIMA Custom(overfit)

cotton_fitR<- Arima(cotton_tsr, order=c(1,1,1))
cotton_fitR
## Series: cotton_tsr 
## ARIMA(1,1,1) 
## 
## Coefficients:
##           ar1     ma1
##       -0.6098  0.8354
## s.e.   0.1782  0.1261
## 
## sigma^2 = 0.003624:  log likelihood = 117.75
## AIC=-229.5   AICc=-229.2   BIC=-222.21
#Check residuals

checkresiduals(cotton_fitR)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(1,1,1)
## Q* = 16.158, df = 15, p-value = 0.3717
## 
## Model df: 2.   Total lags used: 17
#Auto-fit Arima(overfit)

auto_cottonR<- auto.arima(cotton_tsr)
auto_cottonR
## Series: cotton_tsr 
## ARIMA(1,1,1) 
## 
## Coefficients:
##           ar1     ma1
##       -0.6098  0.8354
## s.e.   0.1782  0.1261
## 
## sigma^2 = 0.003624:  log likelihood = 117.75
## AIC=-229.5   AICc=-229.2   BIC=-222.21
checkresiduals(auto_cottonR)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(1,1,1)
## Q* = 16.158, df = 15, p-value = 0.3717
## 
## Model df: 2.   Total lags used: 17
##Forecast Plot

autoplot(forecast::forecast(auto_cottonR, h=12, level=c(80,95)))

##Forecast Custom

autoplot(forecast::forecast(cotton_fitR, h=12, level=c(80,95)))

#Printing Predictions

cotton_predictions <- forecast::forecast(cotton_fitR,h=12)

print(cotton_predictions$mean)
##            Jan       Feb       Mar       Apr       May       Jun       Jul
## 2023           0.8309998 0.8398149 0.8344391 0.8377175 0.8357182 0.8369374
## 2024 0.8364993                                                            
##            Aug       Sep       Oct       Nov       Dec
## 2023 0.8361939 0.8366473 0.8363708 0.8365394 0.8364366
## 2024
cotton_predictions <- forecast::forecast(auto_cottonR,h=12)

print(cotton_predictions$mean)
##            Jan       Feb       Mar       Apr       May       Jun       Jul
## 2023           0.8309998 0.8398149 0.8344391 0.8377175 0.8357182 0.8369374
## 2024 0.8364993                                                            
##            Aug       Sep       Oct       Nov       Dec
## 2023 0.8361939 0.8366473 0.8363708 0.8365394 0.8364366
## 2024

31 GARCH Cotton Modeling

# Model Creation
garch_model <- ugarchspec(variance.model = list(model = "sGARCH", 
                                                garchOrder = c(1,1)), 
                                                mean.model = list(armaOrder =
                                                c(0,0), include.mean = TRUE), distribution.model = "std")
# Fitting Model to Data

cotton_garch <- ugarchfit(spec = garch_model, data = cotton_ts)

print(cotton_garch)
## 
## *---------------------------------*
## *          GARCH Model Fit        *
## *---------------------------------*
## 
## Conditional Variance Dynamics    
## -----------------------------------
## GARCH Model  : sGARCH(1,1)
## Mean Model   : ARFIMA(0,0,0)
## Distribution : std 
## 
## Optimal Parameters
## ------------------------------------
##          Estimate  Std. Error  t value Pr(>|t|)
## mu       0.703309    0.009548  73.6627 0.000000
## omega    0.001195    0.000339   3.5213 0.000429
## alpha1   0.999000    0.108225   9.2308 0.000000
## beta1    0.000000    0.052273   0.0000 1.000000
## shape   99.999979   65.555380   1.5254 0.127153
## 
## Robust Standard Errors:
##          Estimate  Std. Error  t value Pr(>|t|)
## mu       0.703309    0.042439  16.5722  0.00000
## omega    0.001195    0.000804   1.4866  0.13712
## alpha1   0.999000    0.106978   9.3384  0.00000
## beta1    0.000000    0.027784   0.0000  1.00000
## shape   99.999979   12.121320   8.2499  0.00000
## 
## LogLikelihood : 256.0808 
## 
## Information Criteria
## ------------------------------------
##                     
## Akaike       -1.3872
## Bayes        -1.3334
## Shibata      -1.3876
## Hannan-Quinn -1.3658
## 
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
##                         statistic p-value
## Lag[1]                      268.0       0
## Lag[2*(p+q)+(p+q)-1][2]     373.8       0
## Lag[4*(p+q)+(p+q)-1][5]     629.9       0
## d.o.f=0
## H0 : No serial correlation
## 
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
##                         statistic  p-value
## Lag[1]                      10.13 0.001457
## Lag[2*(p+q)+(p+q)-1][5]     12.12 0.002596
## Lag[4*(p+q)+(p+q)-1][9]     14.30 0.005270
## d.o.f=2
## 
## Weighted ARCH LM Tests
## ------------------------------------
##             Statistic Shape Scale P-Value
## ARCH Lag[3]    0.0149 0.500 2.000  0.9028
## ARCH Lag[5]    4.2406 1.440 1.667  0.1534
## ARCH Lag[7]    4.9004 2.315 1.543  0.2348
## 
## Nyblom stability test
## ------------------------------------
## Joint Statistic:  34.1067
## Individual Statistics:               
## mu      0.41257
## omega   0.05232
## alpha1  0.21244
## beta1   0.27042
## shape  21.91731
## 
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic:          1.28 1.47 1.88
## Individual Statistic:     0.35 0.47 0.75
## 
## Sign Bias Test
## ------------------------------------
##                    t-value   prob sig
## Sign Bias          0.04528 0.9639    
## Negative Sign Bias 0.52471 0.6001    
## Positive Sign Bias 0.04531 0.9639    
## Joint Effect       0.46610 0.9263    
## 
## 
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
##   group statistic p-value(g-1)
## 1    20     267.1    1.066e-45
## 2    30     277.4    2.253e-42
## 3    40     288.6    7.807e-40
## 4    50     289.7    7.205e-36
## 
## 
## Elapsed time : 0.157939
#plot(cotton_garch, which = 1)

coef(cotton_garch)
##           mu        omega       alpha1        beta1        shape 
## 7.033091e-01 1.195185e-03 9.989999e-01 1.157115e-08 9.999998e+01
# Forecasting

horizon <- 3

cotton_forecast_garch <- ugarchforecast(cotton_garch, n.ahead = horizon)

forecast_mean_cotton <- as.numeric(cotton_forecast_garch@forecast$seriesFor)
actual_values_cotton <- as.numeric(window(cotton_ts, start = c(1993, 1)))


#plot(cotton_forecast_garch, n.plot = horizon, n.col = 1, plot.type = "single", 
    # main = "GARCH Forecast for cotton Prices", ylab = "Price", xlab = "Time") #%>%
#lines(cotton_ts[(length(cotton_ts)-horizon+1):length(cotton_ts)], col = "blue")


#Based on signficance. Let's try auto_arimas

garch_model <- ugarchspec(variance.model = list(model = "sGARCH", 
                                                garchOrder = c(1,1)), 
                                                mean.model = list(armaOrder =
                                                c(1,0), include.mean = TRUE), distribution.model = "std")

cotton_garch2 <- ugarchfit(spec = garch_model, data = cotton_ts)
cotton_garch2
## 
## *---------------------------------*
## *          GARCH Model Fit        *
## *---------------------------------*
## 
## Conditional Variance Dynamics    
## -----------------------------------
## GARCH Model  : sGARCH(1,1)
## Mean Model   : ARFIMA(1,0,0)
## Distribution : std 
## 
## Optimal Parameters
## ------------------------------------
##         Estimate  Std. Error  t value Pr(>|t|)
## mu      0.620592    0.031862  19.4772 0.000000
## ar1     0.974716    0.015198  64.1350 0.000000
## omega   0.000651    0.000196   3.3136 0.000921
## alpha1  0.645926    0.196116   3.2936 0.000989
## beta1   0.251608    0.117291   2.1452 0.031940
## shape   4.978376    1.417768   3.5114 0.000446
## 
## Robust Standard Errors:
##         Estimate  Std. Error  t value Pr(>|t|)
## mu      0.620592    0.023988  25.8709 0.000000
## ar1     0.974716    0.020674  47.1470 0.000000
## omega   0.000651    0.000244   2.6709 0.007565
## alpha1  0.645926    0.222294   2.9057 0.003664
## beta1   0.251608    0.171291   1.4689 0.141862
## shape   4.978376    1.474353   3.3767 0.000734
## 
## LogLikelihood : 619.2135 
## 
## Information Criteria
## ------------------------------------
##                     
## Akaike       -3.3879
## Bayes        -3.3234
## Shibata      -3.3885
## Hannan-Quinn -3.3623
## 
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
##                         statistic   p-value
## Lag[1]                      13.12 2.919e-04
## Lag[2*(p+q)+(p+q)-1][2]     13.87 3.741e-14
## Lag[4*(p+q)+(p+q)-1][5]     16.25 1.551e-06
## d.o.f=1
## H0 : No serial correlation
## 
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
##                         statistic p-value
## Lag[1]                     0.3671  0.5446
## Lag[2*(p+q)+(p+q)-1][5]    1.1923  0.8148
## Lag[4*(p+q)+(p+q)-1][9]    2.5292  0.8334
## d.o.f=2
## 
## Weighted ARCH LM Tests
## ------------------------------------
##             Statistic Shape Scale P-Value
## ARCH Lag[3]     1.173 0.500 2.000  0.2788
## ARCH Lag[5]     1.222 1.440 1.667  0.6684
## ARCH Lag[7]     2.415 2.315 1.543  0.6300
## 
## Nyblom stability test
## ------------------------------------
## Joint Statistic:  1.5523
## Individual Statistics:              
## mu     0.99815
## ar1    0.09716
## omega  0.20910
## alpha1 0.13640
## beta1  0.24576
## shape  0.20950
## 
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic:          1.49 1.68 2.12
## Individual Statistic:     0.35 0.47 0.75
## 
## Sign Bias Test
## ------------------------------------
##                    t-value   prob sig
## Sign Bias           0.6637 0.5073    
## Negative Sign Bias  0.1623 0.8712    
## Positive Sign Bias  0.4003 0.6892    
## Joint Effect        1.5536 0.6700    
## 
## 
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
##   group statistic p-value(g-1)
## 1    20     17.78       0.5372
## 2    30     22.70       0.7903
## 3    40     29.60       0.8614
## 4    50     36.90       0.8983
## 
## 
## Elapsed time : 0.122936
#Time Series based on volatility or Variance based on a standard Garch [1,1] model

cotton_vol <-ts(cotton_garch2@fit$sigma^2,start=c(1993),frequency=12)

#plot(cotton_vol,xlab="",ylab="",main="cotton_Volatility (GARCH[1,1])")

#Exponential GARCH (does not work quite as well)
Egarch_model <- ugarchspec(variance.model = list(model = "eGARCH", 
                                                garchOrder = c(1,1)), 
                                                mean.model = list(armaOrder =
                                                c(1,0), include.mean = TRUE), distribution.model = "std")

cotton_egarch2 <- ugarchfit(spec = garch_model, data = cotton_ts)
cotton_egarch2
## 
## *---------------------------------*
## *          GARCH Model Fit        *
## *---------------------------------*
## 
## Conditional Variance Dynamics    
## -----------------------------------
## GARCH Model  : sGARCH(1,1)
## Mean Model   : ARFIMA(1,0,0)
## Distribution : std 
## 
## Optimal Parameters
## ------------------------------------
##         Estimate  Std. Error  t value Pr(>|t|)
## mu      0.620592    0.031862  19.4772 0.000000
## ar1     0.974716    0.015198  64.1350 0.000000
## omega   0.000651    0.000196   3.3136 0.000921
## alpha1  0.645926    0.196116   3.2936 0.000989
## beta1   0.251608    0.117291   2.1452 0.031940
## shape   4.978376    1.417768   3.5114 0.000446
## 
## Robust Standard Errors:
##         Estimate  Std. Error  t value Pr(>|t|)
## mu      0.620592    0.023988  25.8709 0.000000
## ar1     0.974716    0.020674  47.1470 0.000000
## omega   0.000651    0.000244   2.6709 0.007565
## alpha1  0.645926    0.222294   2.9057 0.003664
## beta1   0.251608    0.171291   1.4689 0.141862
## shape   4.978376    1.474353   3.3767 0.000734
## 
## LogLikelihood : 619.2135 
## 
## Information Criteria
## ------------------------------------
##                     
## Akaike       -3.3879
## Bayes        -3.3234
## Shibata      -3.3885
## Hannan-Quinn -3.3623
## 
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
##                         statistic   p-value
## Lag[1]                      13.12 2.919e-04
## Lag[2*(p+q)+(p+q)-1][2]     13.87 3.741e-14
## Lag[4*(p+q)+(p+q)-1][5]     16.25 1.551e-06
## d.o.f=1
## H0 : No serial correlation
## 
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
##                         statistic p-value
## Lag[1]                     0.3671  0.5446
## Lag[2*(p+q)+(p+q)-1][5]    1.1923  0.8148
## Lag[4*(p+q)+(p+q)-1][9]    2.5292  0.8334
## d.o.f=2
## 
## Weighted ARCH LM Tests
## ------------------------------------
##             Statistic Shape Scale P-Value
## ARCH Lag[3]     1.173 0.500 2.000  0.2788
## ARCH Lag[5]     1.222 1.440 1.667  0.6684
## ARCH Lag[7]     2.415 2.315 1.543  0.6300
## 
## Nyblom stability test
## ------------------------------------
## Joint Statistic:  1.5523
## Individual Statistics:              
## mu     0.99815
## ar1    0.09716
## omega  0.20910
## alpha1 0.13640
## beta1  0.24576
## shape  0.20950
## 
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic:          1.49 1.68 2.12
## Individual Statistic:     0.35 0.47 0.75
## 
## Sign Bias Test
## ------------------------------------
##                    t-value   prob sig
## Sign Bias           0.6637 0.5073    
## Negative Sign Bias  0.1623 0.8712    
## Positive Sign Bias  0.4003 0.6892    
## Joint Effect        1.5536 0.6700    
## 
## 
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
##   group statistic p-value(g-1)
## 1    20     17.78       0.5372
## 2    30     22.70       0.7903
## 3    40     29.60       0.8614
## 4    50     36.90       0.8983
## 
## 
## Elapsed time : 0.1272421
coef(cotton_egarch2)
##           mu          ar1        omega       alpha1        beta1        shape 
## 0.6205920349 0.9747158449 0.0006507369 0.6459258710 0.2516076142 4.9783760600
#Time Series based on volatility or Variance based on a standard Garch [1,1] model

ecotton_vol <-ts(cotton_egarch2@fit$sigma^2,start=c(1993),frequency=12)

#plot(ecotton_vol,xlab="",ylab="",main="cotton_Volatility (eGARCH[1,1])")

cor(cotton_vol, ecotton_vol)
## [1] 1
ts.plot(cotton_vol,ecotton_vol,col=c("green","red"),xlab="")
legend("topright",legend=c("Standard","Exponential"),col=c("green","red"),lty=c(1,1))

#No difference shown

#GARCH 3

names(cotton_garch2@model)
##  [1] "modelinc"   "modeldesc"  "modeldata"  "pars"       "start.pars"
##  [6] "fixed.pars" "maxOrder"   "pos.matrix" "fmodel"     "pidx"      
## [11] "n.start"
names(cotton_garch2@fit)
##  [1] "hessian"         "cvar"            "var"             "sigma"          
##  [5] "condH"           "z"               "LLH"             "log.likelihoods"
##  [9] "residuals"       "coef"            "robust.cvar"     "A"              
## [13] "B"               "scores"          "se.coef"         "tval"           
## [17] "matcoef"         "robust.se.coef"  "robust.tval"     "robust.matcoef" 
## [21] "fitted.values"   "convergence"     "kappa"           "persistence"    
## [25] "timer"           "ipars"           "solver"
#Variance
cotton_garch_var <- cotton_garch2@fit$var
#Residuals
cotton_garch_res <- (cotton_garch2@fit$residuals)^2

#Plotting residuals and conditional variances
plot(cotton_garch_res, type = "l")
lines(cotton_garch_var, col = "green")

cotton_forecast_garch2 <- ugarchforecast(cotton_garch2, n.ahead = 12)
cotton_forecast_garch2
## 
## *------------------------------------*
## *       GARCH Model Forecast         *
## *------------------------------------*
## Model: sGARCH
## Horizon: 12
## Roll Steps: 0
## Out of Sample: 0
## 
## 0-roll forecast [T0=Feb 2023]:
##      Series   Sigma
## T+1  0.8382 0.03149
## T+2  0.8327 0.03925
## T+3  0.8273 0.04510
## T+4  0.8221 0.04976
## T+5  0.8170 0.05360
## T+6  0.8120 0.05683
## T+7  0.8072 0.05958
## T+8  0.8025 0.06194
## T+9  0.7979 0.06398
## T+10 0.7934 0.06577
## T+11 0.7890 0.06733
## T+12 0.7848 0.06870
ug_cotton <- cotton_forecast_garch2@forecast$sigmaFor
plot(ug_cotton, type = "l")

cotton_var_t <- c(tail(cotton_garch_var,20),rep(NA,10))  # gets the last 20 observations
cotton_res_t <- c(tail(cotton_garch_res,20),rep(NA,10))  # gets the last 20 observations
cotton_f <- c(rep(NA,20),(ug_cotton)^2)

plot(cotton_res_t, type = "l") #Residuals
lines(cotton_f, col = "orange") # Predictions 
lines(cotton_var_t, col = "green") #Conditional Forecast
legend("topright", 
       legend = c("Residuals", "Predictions", "Conditional Forecast"), 
       col = c("black", "orange", "green"), 
       lty = c(1, 1, 1), 
       cex = 0.8)

#Plot Predictions
#plot(cotton_forecast_garch2, main = "Forecasted cotton Prices (GARCH(1,1))")


cotton_mean_forecast <- as.numeric(cotton_forecast_garch2@forecast$seriesFor)


# Plot the mean forecasted values with the two confidence intervals

#plot(cotton_forecast_garch2, main = "Forecasted cotton Prices (GARCH(1,1))")
#ARIMA Models
forecast::accuracy(cotton_fit)
##                        ME       RMSE        MAE        MPE     MAPE      MASE
## Training set 0.0002759028 0.05790952 0.03712278 -0.5037232 5.077883 0.2186312
##                     ACF1
## Training set 0.001309187
forecast::accuracy(auto_cotton)
##                        ME      RMSE        MAE         MPE     MAPE      MASE
## Training set 0.0004544867 0.0606267 0.03717778 -0.07564934 5.035075 0.2189551
##                      ACF1
## Training set -0.002240768
forecast::accuracy(cotton_fitR)
##                       ME       RMSE        MAE       MPE     MAPE      MASE
## Training set 0.002263093 0.05912699 0.03867624 0.1458866 4.549521 0.2245412
##                    ACF1
## Training set -0.0242505
AIC(cotton_fit)
## [1] -1008.04
BIC(cotton_fit)
## [1] -961.3404
AIC(cotton_fitR)
## [1] -229.5022
BIC(cotton_fitR)
## [1] -222.2097
#GARCH Model

actual_values_cotton <- head(actual_values_cotton, length(cotton_mean_forecast))

mae <- mean(abs(forecast_mean_cotton - actual_values_cotton))
mse <- mean((forecast_mean_cotton - actual_values_cotton)^2)
rmse <- sqrt(mse)

# Print the results
cat(paste("MAE: ", mae, "\n"))
## MAE:  0.103256141089996
cat(paste("MSE: ", mse, "\n"))
## MSE:  0.0112779322589007
cat(paste("RMSE: ", rmse, "\n"))
## RMSE:  0.106197609478277
#GARCH Model 2
forecast_mean_cotton2 <- as.numeric(cotton_forecast_garch2@forecast$seriesFor)
actual_values_cotton2 <- as.numeric(window(cotton_ts, start = c(1993, 1)))
actual_values_cotton2 <- head(actual_values_cotton2, length(forecast_mean_cotton2))

cotton_mae <- mean(abs(forecast_mean_cotton2 - actual_values_cotton2))
cotton_mse <- mean((forecast_mean_cotton2 - actual_values_cotton2)^2)
cotton_rmse <- sqrt(cotton_mse)
# Print the results
cat(paste("MAE: ", cotton_mae, "\n"))
## MAE:  0.210298087834802
cat(paste("MSE: ", cotton_mse, "\n"))
## MSE:  0.0448996204296999
cat(paste("RMSE: ", cotton_rmse, "\n"))
## RMSE:  0.211895305350779
cotton_garch
## 
## *---------------------------------*
## *          GARCH Model Fit        *
## *---------------------------------*
## 
## Conditional Variance Dynamics    
## -----------------------------------
## GARCH Model  : sGARCH(1,1)
## Mean Model   : ARFIMA(0,0,0)
## Distribution : std 
## 
## Optimal Parameters
## ------------------------------------
##          Estimate  Std. Error  t value Pr(>|t|)
## mu       0.703309    0.009548  73.6627 0.000000
## omega    0.001195    0.000339   3.5213 0.000429
## alpha1   0.999000    0.108225   9.2308 0.000000
## beta1    0.000000    0.052273   0.0000 1.000000
## shape   99.999979   65.555380   1.5254 0.127153
## 
## Robust Standard Errors:
##          Estimate  Std. Error  t value Pr(>|t|)
## mu       0.703309    0.042439  16.5722  0.00000
## omega    0.001195    0.000804   1.4866  0.13712
## alpha1   0.999000    0.106978   9.3384  0.00000
## beta1    0.000000    0.027784   0.0000  1.00000
## shape   99.999979   12.121320   8.2499  0.00000
## 
## LogLikelihood : 256.0808 
## 
## Information Criteria
## ------------------------------------
##                     
## Akaike       -1.3872
## Bayes        -1.3334
## Shibata      -1.3876
## Hannan-Quinn -1.3658
## 
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
##                         statistic p-value
## Lag[1]                      268.0       0
## Lag[2*(p+q)+(p+q)-1][2]     373.8       0
## Lag[4*(p+q)+(p+q)-1][5]     629.9       0
## d.o.f=0
## H0 : No serial correlation
## 
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
##                         statistic  p-value
## Lag[1]                      10.13 0.001457
## Lag[2*(p+q)+(p+q)-1][5]     12.12 0.002596
## Lag[4*(p+q)+(p+q)-1][9]     14.30 0.005270
## d.o.f=2
## 
## Weighted ARCH LM Tests
## ------------------------------------
##             Statistic Shape Scale P-Value
## ARCH Lag[3]    0.0149 0.500 2.000  0.9028
## ARCH Lag[5]    4.2406 1.440 1.667  0.1534
## ARCH Lag[7]    4.9004 2.315 1.543  0.2348
## 
## Nyblom stability test
## ------------------------------------
## Joint Statistic:  34.1067
## Individual Statistics:               
## mu      0.41257
## omega   0.05232
## alpha1  0.21244
## beta1   0.27042
## shape  21.91731
## 
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic:          1.28 1.47 1.88
## Individual Statistic:     0.35 0.47 0.75
## 
## Sign Bias Test
## ------------------------------------
##                    t-value   prob sig
## Sign Bias          0.04528 0.9639    
## Negative Sign Bias 0.52471 0.6001    
## Positive Sign Bias 0.04531 0.9639    
## Joint Effect       0.46610 0.9263    
## 
## 
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
##   group statistic p-value(g-1)
## 1    20     267.1    1.066e-45
## 2    30     277.4    2.253e-42
## 3    40     288.6    7.807e-40
## 4    50     289.7    7.205e-36
## 
## 
## Elapsed time : 0.157939
cotton_garch2
## 
## *---------------------------------*
## *          GARCH Model Fit        *
## *---------------------------------*
## 
## Conditional Variance Dynamics    
## -----------------------------------
## GARCH Model  : sGARCH(1,1)
## Mean Model   : ARFIMA(1,0,0)
## Distribution : std 
## 
## Optimal Parameters
## ------------------------------------
##         Estimate  Std. Error  t value Pr(>|t|)
## mu      0.620592    0.031862  19.4772 0.000000
## ar1     0.974716    0.015198  64.1350 0.000000
## omega   0.000651    0.000196   3.3136 0.000921
## alpha1  0.645926    0.196116   3.2936 0.000989
## beta1   0.251608    0.117291   2.1452 0.031940
## shape   4.978376    1.417768   3.5114 0.000446
## 
## Robust Standard Errors:
##         Estimate  Std. Error  t value Pr(>|t|)
## mu      0.620592    0.023988  25.8709 0.000000
## ar1     0.974716    0.020674  47.1470 0.000000
## omega   0.000651    0.000244   2.6709 0.007565
## alpha1  0.645926    0.222294   2.9057 0.003664
## beta1   0.251608    0.171291   1.4689 0.141862
## shape   4.978376    1.474353   3.3767 0.000734
## 
## LogLikelihood : 619.2135 
## 
## Information Criteria
## ------------------------------------
##                     
## Akaike       -3.3879
## Bayes        -3.3234
## Shibata      -3.3885
## Hannan-Quinn -3.3623
## 
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
##                         statistic   p-value
## Lag[1]                      13.12 2.919e-04
## Lag[2*(p+q)+(p+q)-1][2]     13.87 3.741e-14
## Lag[4*(p+q)+(p+q)-1][5]     16.25 1.551e-06
## d.o.f=1
## H0 : No serial correlation
## 
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
##                         statistic p-value
## Lag[1]                     0.3671  0.5446
## Lag[2*(p+q)+(p+q)-1][5]    1.1923  0.8148
## Lag[4*(p+q)+(p+q)-1][9]    2.5292  0.8334
## d.o.f=2
## 
## Weighted ARCH LM Tests
## ------------------------------------
##             Statistic Shape Scale P-Value
## ARCH Lag[3]     1.173 0.500 2.000  0.2788
## ARCH Lag[5]     1.222 1.440 1.667  0.6684
## ARCH Lag[7]     2.415 2.315 1.543  0.6300
## 
## Nyblom stability test
## ------------------------------------
## Joint Statistic:  1.5523
## Individual Statistics:              
## mu     0.99815
## ar1    0.09716
## omega  0.20910
## alpha1 0.13640
## beta1  0.24576
## shape  0.20950
## 
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic:          1.49 1.68 2.12
## Individual Statistic:     0.35 0.47 0.75
## 
## Sign Bias Test
## ------------------------------------
##                    t-value   prob sig
## Sign Bias           0.6637 0.5073    
## Negative Sign Bias  0.1623 0.8712    
## Positive Sign Bias  0.4003 0.6892    
## Joint Effect        1.5536 0.6700    
## 
## 
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
##   group statistic p-value(g-1)
## 1    20     17.78       0.5372
## 2    30     22.70       0.7903
## 3    40     29.60       0.8614
## 4    50     36.90       0.8983
## 
## 
## Elapsed time : 0.122936

32 EDA Results

# There seems to be major economic events that impact the prices of every commodity. These include sharp increases and decreases in prices before a leveling out. However, there does not seem to be significant price changes based on weekday. Commodity pricing did see some sharp changes during the years 2008 to 2012, which deserves additional exploration.

#In general, the 90's showed steady prices of commodities and even were on a bit of downward trend. Things began to shift around 2002. Between 2002 most commodities saw a gradual incline followed by a sharp spike in 2008 again in 2011 and one more time around 2020 and 2022. Some of the common factors of price increase include the declining value of the US dollar, rising energy prices and crop yields. 

#Demand for products has been increasing as the world population increases. Developing countries and markets tend to use more commodities which increases price on global markets. After the 2008 recession, demand increased along with continued currency depreciation leading to price spikes. 

#The current economic situation is unique and has led to another commodity price surge for a number of reasons. High inflation continues to devalue currency and be a increase price. A sharp increase in demand and manufacturing have also impacted price, following covid initiated supply disruptions. Weather and inaccurate forecasting also impacted those price hikes. 

#The biggest industries that use similar commodities as Swire are food, fuel and clothing. A disruption in demand from any one of these will impact commodity prices for Swire. For instance, the fuel industry uses a lot of corn and sugar in their production. This industry is particularly vulnerable to economic events leading to difficulty in price forecasting.

# As I explored the data I realized more and more that there is a great deal of freedom in it and always something additional to look at. I have further questions and plots I'd like to create for this and will continue to add to this during the semester as I keep working on this project. 

#At this point, no ethical concerns were uncovered.

33 Model Performance Evaluation

# The ARIMA models allowed for the most flexibility in parameter selection. I was also able to predict using 30 years of data or just the more recent(3-6 years) pricing data. For the commodities sugar and coffee, using more recent data actually helped improve predictions for future pricing. A possible explanation of this, is that recent data is more relevant to predicting near-future prices. The auto-generated ARIMA models tended to be less accurate and have very poor predictive power. For this reason, hand-selecting P,D and Q values based on correlation and partial-correlation, this led to the best results.

# The ETS models take from more recent data for its predictions. As a result, slicing by recent data does not do much to impact predictions. The predictions were worse than the ARIMA models and the confidence regions were much larger. When compared to the ARIMA and GARCH models, the ETS model had little upside. 

# The GARCH model proved to be extremely valuable in predicting commodity pricing. Commodities are volatile to market changes making it hard to predict prices. The model predicts with this in mind, making it more accurate for financial data. The GARCH models continually predicted market volatility much better than the ARIMA model. It was a little less accurate during regular market flow. 

# Based on prediction errors RMSE, MAE, and the residuals, the ARIMA and GARCH models are the best for prediction. The ARIMA model is valuable to use in tandem with the GARCH models. ARIMA is the easiest model to tune making it a great choice for analysis and building a model with varying levels of confidence. The GARCH model controls for volatility in this specific type of data. This makes it a great choice for predictions in times of uncertainty. My recommendation would be to use both models when making business decisions based on commodity price. 

34 Model Results and Business Problem

# Having accurate predictions of price will help Swire with their business strategies. Knowing the direction price is likely to take, higher or lower, will inform Swire of when to purchase and how much volume to purchase. The ideal being to purchase commodities at the lowest price to maximize margins. The models have proven accurate enough to give direction information several months in advance. The price range within the forecasts looking forward a few months is relatively small. Considering the amount of information these models can give to purchase planning, they provide a great deal of help in solving the business problem.